optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+5529.17%)
allie🤖 A machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers).
Stars: ✭ 93 (+287.5%)
Data Forge TsThe JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (+3929.17%)
bumblebee🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Stars: ✭ 120 (+400%)
prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (+125%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+4008.33%)
qsvCSVs sliced, diced & analyzed.
Stars: ✭ 438 (+1725%)
whyqddata wrangling simplicity, complete audit transparency, and at speed
Stars: ✭ 16 (-33.33%)
php-serializerSerialize PHP variables, including objects, in any format. Support to unserialize it too.
Stars: ✭ 47 (+95.83%)
pycsvwA tool to read CSV files with CSVW metadata and transform them into other formats.
Stars: ✭ 32 (+33.33%)
datapackage-mPower Query M functions for working with Tabular Data Packages (Frictionless Data) in Power BI and Excel
Stars: ✭ 26 (+8.33%)
reskitA library for creating and curating reproducible pipelines for scientific and industrial machine learning
Stars: ✭ 27 (+12.5%)
gallia-coreA schema-aware Scala library for data transformation
Stars: ✭ 44 (+83.33%)
R-Learning-JourneySome of the projects i made when starting to learn R for Data Science at the university
Stars: ✭ 19 (-20.83%)
pyrefineExecute OpenRefine JSON scripts without OpenRefine (or Java)
Stars: ✭ 25 (+4.17%)
serializer-benchmarkA PHP benchmark application to compare PHP serializer libraries
Stars: ✭ 14 (-41.67%)
objectiv-analyticsPowerful product analytics for data teams, with full control over data & models.
Stars: ✭ 399 (+1562.5%)
R Ecology LessonData Analysis and Visualization in R for Ecologists
Stars: ✭ 218 (+808.33%)
daanyDaany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.
Stars: ✭ 49 (+104.17%)
fastverseAn Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
Stars: ✭ 123 (+412.5%)
Chapter-2Code examples for Chapter 2 of Data Wrangling with JavaScript
Stars: ✭ 16 (-33.33%)
pandas-workshopAn introductory workshop on pandas with notebooks and exercises for following along.
Stars: ✭ 161 (+570.83%)
Data-Analyst-NanodegreeThis repo consists of the projects that I completed as a part of the Udacity's Data Analyst Nanodegree's curriculum.
Stars: ✭ 13 (-45.83%)
Data-Wrangling-with-PythonSimplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+275%)
Cleaner.jlA toolbox of simple solutions for common data cleaning problems.
Stars: ✭ 21 (-12.5%)
wranglerWrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (+162.5%)
FIFA-2019-AnalysisThis is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (+16.67%)
Semantic-Busobject flow treatment, data transformation
Stars: ✭ 49 (+104.17%)
tutorialsShort programming tutorials pertaining to data analysis.
Stars: ✭ 14 (-41.67%)
dynamic.yamlDEPRECATED: YAML-based data transformations
Stars: ✭ 14 (-41.67%)
LDWizardA generic framework for simplifying the creation of linked data.
Stars: ✭ 17 (-29.17%)
richflowA Node.js and JavaScript synchronous data pipeline processing, data sharing and stream processing library. Actionable & Transformable Pipeline data processing.
Stars: ✭ 17 (-29.17%)
clojure-dsl-resourcesA curated list of Clojure resources for dealing with domain-specific languages.
Stars: ✭ 99 (+312.5%)
errorlocateFind and replace erroneous fields in data using validation rules
Stars: ✭ 19 (-20.83%)
Data Forge JsJavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 139 (+479.17%)
xploreA python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.
Stars: ✭ 21 (-12.5%)
DatatestTools for test driven data-wrangling and data validation.
Stars: ✭ 238 (+891.67%)
zinggScalable identity resolution, entity resolution, data mastering and deduplication using ML
Stars: ✭ 655 (+2629.17%)
QsacnpjPacote que trata e organiza os dados do Cadastro Nacional da Pessoa Jurídica (CNPJ)
Stars: ✭ 187 (+679.17%)
SjmiscData transformation and utility functions for R
Stars: ✭ 141 (+487.5%)
exemplary-ml-pipelineExemplary, annotated machine learning pipeline for any tabular data problem.
Stars: ✭ 23 (-4.17%)
Data-Science-101Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (-20.83%)
HypertoolsA Python toolbox for gaining geometric insights into high-dimensional data
Stars: ✭ 1,678 (+6891.67%)
naas⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment
Stars: ✭ 219 (+812.5%)
machine-learning-data-pipelinePipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (-8.33%)
pipe envyElixir style pipe operator for Ruby
Stars: ✭ 46 (+91.67%)
cqClojure Command-line Data Processor for JSON, YAML, EDN, XML and more
Stars: ✭ 111 (+362.5%)