prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (+157.14%)
pandas-workshopAn introductory workshop on pandas with notebooks and exercises for following along.
Stars: ✭ 161 (+666.67%)
Data-Science-101Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (-9.52%)
SMMTSocial Media Mining Toolkit (SMMT) main repository
Stars: ✭ 116 (+452.38%)
Data-Wrangling-with-PythonSimplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+328.57%)
whyqddata wrangling simplicity, complete audit transparency, and at speed
Stars: ✭ 16 (-23.81%)
timit-preprocessorExtract mfcc vectors and phones from TIMIT dataset
Stars: ✭ 14 (-33.33%)
modelscriptREPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript
Stars: ✭ 40 (+90.48%)
pyrefineExecute OpenRefine JSON scripts without OpenRefine (or Java)
Stars: ✭ 25 (+19.05%)
Data-Analyst-NanodegreeThis repo consists of the projects that I completed as a part of the Udacity's Data Analyst Nanodegree's curriculum.
Stars: ✭ 13 (-38.1%)
machine-learning-data-pipelinePipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (+4.76%)
scibloxsciblox - Easier Data Science and Machine Learning
Stars: ✭ 48 (+128.57%)
nuts-mlFlow-based data pre-processing for deep learning
Stars: ✭ 32 (+52.38%)
qsvCSVs sliced, diced & analyzed.
Stars: ✭ 438 (+1985.71%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+6333.33%)
DatatestTools for test driven data-wrangling and data validation.
Stars: ✭ 238 (+1033.33%)
R Ecology LessonData Analysis and Visualization in R for Ecologists
Stars: ✭ 218 (+938.1%)
QsacnpjPacote que trata e organiza os dados do Cadastro Nacional da Pessoa Jurídica (CNPJ)
Stars: ✭ 187 (+790.48%)
SjmiscData transformation and utility functions for R
Stars: ✭ 141 (+571.43%)
Data Forge JsJavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 139 (+561.9%)
HypertoolsA Python toolbox for gaining geometric insights into high-dimensional data
Stars: ✭ 1,678 (+7890.48%)
Uc R.github.ioMain repository for R programming courses @ University of Cincinnati, courses and tutorials that focus on data wrangling, exploration, visualization, and analysis with R.
Stars: ✭ 76 (+261.9%)
OpenrefineOpenRefine is a free, open source power tool for working with messy data and improving it
Stars: ✭ 8,531 (+40523.81%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+4595.24%)
Data Forge TsThe JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (+4504.76%)
Moderndive bookStatistical Inference via Data Science: A ModernDive into R and the Tidyverse
Stars: ✭ 527 (+2409.52%)
ProseMicrosoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.
Stars: ✭ 470 (+2138.1%)
SqawkLike Awk but with SQL and table joins
Stars: ✭ 263 (+1152.38%)
mimirData-ish exploration through SQL+Uncertainty
Stars: ✭ 26 (+23.81%)
foofahFoofah: programming-by-example data transformation program synthesizer
Stars: ✭ 24 (+14.29%)
Chapter-2Code examples for Chapter 2 of Data Wrangling with JavaScript
Stars: ✭ 16 (-23.81%)
SumStatsRehabGWAS summary statistics files QC tool
Stars: ✭ 19 (-9.52%)
klar-EDAA python library for automated exploratory data analysis
Stars: ✭ 15 (-28.57%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-19.05%)
candockA time series signal analysis and classification framework
Stars: ✭ 56 (+166.67%)