Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
Easy to use Python library of customized functions for cleaning and analyzing data.
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
The standard package for machine learning with noisy labels, finding mislabeled data, and uncertainty quantification. Works with most datasets and models.
Jupyter notebook and datasets from the pandas Q&A video series
General Assembly's 2015 Data Science course in Washington, DC
Cluster and merge similar char values: an R implementation of Open Refine clustering algorithms
🚕 A spreadsheet-like data preparation web app that works over Optimus (pandas, dask, cuDF, dask-cuDF and PySpark)
Fast and Easy Data Cleaning (in R)
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
simple tools for data cleaning in R
Drugs Recommendation Using Reviews
Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Data Forge Ts
Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines
A light-weight, flexible, and expressive pandas data validation library
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Professional data validation for the R environment
Encoding methods for dirty categorical variables
🤖 A machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers).
Foofah: programming-by-example data transformation program synthesizer
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Find and replace erroneous fields in data using validation rules
Powerful product analytics for data teams, with full control over data & models.
A toolbox of simple solutions for common data cleaning problems.
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations