All Categories → Data Processing → data-processing

Top 57 data-processing open source projects

🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Manipulating VASP files with Python.
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project:
Pulsar Flink
Elastic data processing with Apache Pulsar and Apache Flink
Data Processing Agreements
Collection of Data Processing Agreement (DPA) and GDPR compliance resources
Distributed Dataset
A distributed data processing framework in Haskell.
Bash Oneliner
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Machine Learning For Solar Energy Prediction
Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning
Concurrent and multi-stage data ingestion and data processing with Elixir
Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project:
2019 Electronic Design Competition
【电赛】2019 全国大学生电子设计竞赛 (F题)纸张数量检测装置 (基于STM32F407 & FDC2214 & USART HMI)
CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.
R package for normalizing RNA-seq data to make them comparable to microarray data.
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Texar Pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project:
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Super fast list of dicts to pre-formatted tables conversion library for Python 2/3
Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it.
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Baleen 3 is a data processing tool based on the Annot8 framework
Apache Pulsar client library for Erlang/Elixir
♿ Suite of open and standards-based tools for performing reliable accessibility conformance testing at scale
Little DSL to make data processing sane with clojure.spec and spec-tools
A framework for processing adsorption data and isotherm fitting
An Apache Pulsar client written in Elixir
Pipeline module for parallel real-time data processing for machine learning models development and production purposes.
1-57 of 57 data-processing projects