All Categories → Data Processing → data-processing

Top 60 data-processing open source projects

Pxi
🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
Pysparkling
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Vaspy
Manipulating VASP files with Python.
Texar
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Pulsar Flink
Elastic data processing with Apache Pulsar and Apache Flink
Data Processing Agreements
Collection of Data Processing Agreement (DPA) and GDPR compliance resources
Distributed Dataset
A distributed data processing framework in Haskell.
Bash Oneliner
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Machine Learning For Solar Energy Prediction
Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning
Broadway
Concurrent and multi-stage data ingestion and data processing with Elixir
Forte
Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/
2019 Electronic Design Competition
【电赛】2019 全国大学生电子设计竞赛 (F题)纸张数量检测装置 (基于STM32F407 & FDC2214 & USART HMI)
Cbrain
CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.
Tdm
R package for normalizing RNA-seq data to make them comparable to microarray data.
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Texar Pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Xidel
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Dali
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Nonechucks
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Rapidtables
Super fast list of dicts to pre-formatted tables conversion library for Python 2/3
Hub
Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
baleen3
Baleen 3 is a data processing tool based on the Annot8 framework
pulserl
Apache Pulsar client library for Erlang/Elixir
alfa
♿ Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale
meta-schema
Little DSL to make data processing sane with clojure.spec and spec-tools
pyGAPS
A framework for processing adsorption data and isotherm fitting
stargate
An Apache Pulsar client written in Elixir
machine-learning-data-pipeline
Pipeline module for parallel real-time data processing for machine learning models development and production purposes.
processor
A simple and lightweight JavaScript data processing tool. Live demo:
etl
[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
1-60 of 60 data-processing projects