All Categories → Data Processing → data-cleaning

Top 37 data-cleaning open source projects

Voicebook
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
Klib
Easy to use Python library of customized functions for cleaning and analyzing data.
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Datamaid
An R package for data screening
Pandas Videos
Jupyter notebook and datasets from the pandas Q&A video series
Refinr
Cluster and merge similar char values: an R implementation of Open Refine clustering algorithms
Bumblebee
🚕 A spreadsheet-like data preparation web app that works over Optimus (pandas, dask, cuDF, dask-cuDF and PySpark)
Clean
Fast and Easy Data Cleaning (in R)
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Drugs Recommendation Using Reviews
Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Moodle Local datacleaner
Reduce, filter, and anonymize moodle data for non-prod environments
Boltzmannclean
Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines
Nonechucks
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Validate
Professional data validation for the R environment
Dirty cat
Encoding methods for dirty categorical variables
covid-19-data-cleanup
Scripts to cleanup data from https://github.com/CSSEGISandData/COVID-19
nepali-translator
Neural Machine Translation on the Nepali-English language pair
bumblebee
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
errorlocate
Find and replace erroneous fields in data using validation rules
Cleaner.jl
A toolbox of simple solutions for common data cleaning problems.
FIFA-2019-Analysis
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
R-Learning-Journey
Some of the projects i made when starting to learn R for Data Science at the university
1-37 of 37 data-cleaning projects