All Projects → facebookresearch → torcharrow

facebookresearch / torcharrow

Licence: BSD-3-Clause license
High performance model preprocessing library on PyTorch

Programming Languages

python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects
CMake
9771 projects

Projects that are alternatives of or similar to torcharrow

contextualSpellCheck
✔️Contextual word checker for better suggestions
Stars: ✭ 274 (-51.59%)
Mutual labels:  preprocessing
tweets-preprocessor
Repo containing the Twitter preprocessor module, developed by the AUTH OSWinds team
Stars: ✭ 26 (-95.41%)
Mutual labels:  preprocessing
preprocess-conll05
Scripts for preprocessing the CoNLL-2005 SRL dataset.
Stars: ✭ 17 (-97%)
Mutual labels:  preprocessing
sparklanes
A lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-97%)
Mutual labels:  preprocessing
veridical-flow
Making it easier to build stable, trustworthy data-science pipelines.
Stars: ✭ 28 (-95.05%)
Mutual labels:  preprocessing
pywedge
Makes Interactive Chart Widget, Cleans raw data, Runs baseline models, Interactive hyperparameter tuning & tracking
Stars: ✭ 49 (-91.34%)
Mutual labels:  preprocessing
SeqTools
A python library to manipulate and transform indexable data (lists, arrays, ...)
Stars: ✭ 42 (-92.58%)
Mutual labels:  preprocessing
HyperGBM
A full pipeline AutoML tool for tabular data
Stars: ✭ 172 (-69.61%)
Mutual labels:  preprocessing
text-normalizer
Normalize text string
Stars: ✭ 12 (-97.88%)
Mutual labels:  preprocessing
postcss-each
PostCSS plugin to iterate through values
Stars: ✭ 93 (-83.57%)
Mutual labels:  preprocessing
TextDatasetCleaner
🔬 Очистка датасетов от мусора (нормализация, препроцессинг)
Stars: ✭ 27 (-95.23%)
Mutual labels:  preprocessing
Preprocessing-Method-for-STEMI-Detection
Official source code of "Preprocessing Method for Performance Enhancement in CNN-based STEMI Detection from 12-lead ECG"
Stars: ✭ 12 (-97.88%)
Mutual labels:  preprocessing
dropEst
Pipeline for initial analysis of droplet-based single-cell RNA-seq data
Stars: ✭ 71 (-87.46%)
Mutual labels:  preprocessing
MLLabelUtils.jl
Utility package for working with classification targets and label-encodings
Stars: ✭ 30 (-94.7%)
Mutual labels:  preprocessing
podium
Podium: a framework agnostic Python NLP library for data loading and preprocessing
Stars: ✭ 55 (-90.28%)
Mutual labels:  preprocessing
cmip6 preprocessing
Analysis ready CMIP6 data in python the easy way with pangeo tools.
Stars: ✭ 126 (-77.74%)
Mutual labels:  preprocessing
skippa
SciKIt-learn Pipeline in PAndas
Stars: ✭ 33 (-94.17%)
Mutual labels:  preprocessing
Igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
Stars: ✭ 2,956 (+422.26%)
Mutual labels:  preprocessing
PyTOUGH
A Python library for automating TOUGH2 simulations of subsurface fluid and heat flow
Stars: ✭ 64 (-88.69%)
Mutual labels:  preprocessing
BrainPrep
Preprocessing pipeline on Brain MR Images through FSL and ANTs, including registration, skull-stripping, bias field correction, enhancement and segmentation.
Stars: ✭ 107 (-81.1%)
Mutual labels:  preprocessing

TorchArrow: a data processing library for PyTorch

This library currently does not have a stable release. The API and implementation may change. Future changes may not be backward compatible.

TorchArrow is a torch.Tensor-like Python DataFrame library for data preprocessing in PyTorch models, with two high-level features:

  • DataFrame library (like Pandas) with strong GPU or other hardware acceleration (under development) and PyTorch ecosystem integration.
  • Columnar memory layout based on Apache Arrow with strong variable-width and nested data support (such as string, list, map) and Arrow ecosystem integration.

Installation

You will need Python 3.7 or later. Also, we highly recommend installing an Miniconda environment.

First, set up an environment. If you are using conda, create a conda environment:

conda create --name torcharrow python=3.7
conda activate torcharrow

Version Compatibility

The following is the corresponding torcharrow versions and supported Python versions.

torch torcharrow python
main / nightly main / nightly >=3.7, <=3.10
1.13.0 0.2.0 >=3.7, <=3.10

Colab

Follow the instructions in this Colab notebook

Nightly Binaries

Experimental nightly binary on macOS (requires macOS SDK >= 10.15) and Linux (requires glibc >= 2.17) for Python 3.7, 3.8, and 3.9 can be installed via pip wheels:

pip install --pre torcharrow -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

From Source

If you are installing from source, you will need Python 3.7 or later and a C++17 compiler.

Get the TorchArrow Source

git clone --recursive https://github.com/pytorch/torcharrow
cd torcharrow
# if you are updating an existing checkout
git submodule sync --recursive
git submodule update --init --recursive

Install Dependencies

On macOS

HomeBrew is required to install development tools on macOS.

# Install dependencies from Brew
brew install --formula ninja flex bison cmake ccache icu4c boost gflags glog libevent

# Build and install other dependencies
scripts/build_mac_dep.sh ranges_v3 fmt double_conversion folly re2

On Ubuntu (20.04 or later)

# Install dependencies from APT
apt install -y g++ cmake ccache ninja-build checkinstall \
    libssl-dev libboost-all-dev libdouble-conversion-dev libgoogle-glog-dev \
    libgflags-dev libevent-dev libre2-dev libfl-dev libbison-dev
# Build and install folly and fmt
scripts/setup-ubuntu.sh

Install TorchArrow

For local development, you can build with debug mode:

DEBUG=1 python setup.py develop

And run unit tests with

python -m unittest -v

To build and install TorchArrow with release mode:

python setup.py install

License

TorchArrow is BSD licensed, as found in the LICENSE file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].