MillerMiller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Pxi🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
AmadeusHarmonious distributed data analysis in Rust.
PysparklingA pure Python implementation of Apache Spark's RDD and DStream interfaces.
VaspyManipulating VASP files with Python.
CollapseAdvanced and Fast Data Transformation in R
TexarToolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
PadasipPython Adaptive Signal Processing
Pulsar FlinkElastic data processing with Apache Pulsar and Apache Flink
BonoboExtract Transform Load for Python 3.5+
Bash OnelinerA collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
BroadwayConcurrent and multi-stage data ingestion and data processing with Elixir
ForteForte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/
DialogptLarge-scale pretraining for dialogue
CbrainCBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.
MdsplusThe MDSplus data management system
TdmR package for normalizing RNA-seq data to make them comparable to microarray data.
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Texar PytorchIntegrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
PanderaA light-weight, flexible, and expressive pandas data validation library
XidelCommand line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
DaliA GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
NonechucksDeal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
RapidtablesSuper fast list of dicts to pre-formatted tables conversion library for Python 2/3
HubDataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
baleen3Baleen 3 is a data processing tool based on the Annot8 framework
pulserlApache Pulsar client library for Erlang/Elixir
alfa♿ Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale
meta-schemaLittle DSL to make data processing sane with clojure.spec and spec-tools
pyGAPSA framework for processing adsorption data and isotherm fitting
sparklanesA lightweight data processing framework for Apache Spark
cqClojure Command-line Data Processor for JSON, YAML, EDN, XML and more
Speech-RecognitionEnd-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
mech🦾 Main repository for the Mech programming language. Start here!
tracemlEngine for ML/Data tracking, visualization, dashboards, and model UI for Polyaxon.
stargateAn Apache Pulsar client written in Elixir
parallel-corpora-toolsTools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
ProcessorOntology-driven Linked Data processor and server for SPARQL backends. Apache License.
rsgislibRemote Sensing and GIS Software Library; python module tools for processing spatial data.
processorA simple and lightweight JavaScript data processing tool. Live demo:
perkeA keyphrase extractor for Persian
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library