Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
Harmonious distributed data analysis in Rust.
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Manipulating VASP files with Python.
Advanced and Fast Data Transformation in R
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Python Adaptive Signal Processing
Elastic data processing with Apache Pulsar and Apache Flink
Extract Transform Load for Python 3.5+
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Concurrent and multi-stage data ingestion and data processing with Elixir
Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/
Large-scale pretraining for dialogue
CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.
The MDSplus data management system
R package for normalizing RNA-seq data to make them comparable to microarray data.
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
A light-weight, flexible, and expressive pandas data validation library
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Super fast list of dicts to pre-formatted tables conversion library for Python 2/3
Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Baleen 3 is a data processing tool based on the Annot8 framework
Apache Pulsar client library for Erlang/Elixir
♿ Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale
Little DSL to make data processing sane with clojure.spec and spec-tools
A framework for processing adsorption data and isotherm fitting
A lightweight data processing framework for Apache Spark
Clojure Command-line Data Processor for JSON, YAML, EDN, XML and more
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
🦾 Main repository for the Mech programming language. Start here!
Engine for ML/Data tracking, visualization, dashboards, and model UI for Polyaxon.
An Apache Pulsar client written in Elixir
Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
Ontology-driven Linked Data processor and server for SPARQL backends. Apache License.
Remote Sensing and GIS Software Library; python module tools for processing spatial data.