Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+1912.24%)
Janitorsimple tools for data cleaning in R
Stars: ✭ 981 (+1902.04%)
Drugs Recommendation Using ReviewsAnalyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Stars: ✭ 35 (-28.57%)
Data Forge TsThe JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (+1873.47%)
BoltzmanncleanFill missing values in Pandas DataFrames using Restricted Boltzmann Machines
Stars: ✭ 23 (-53.06%)
PanderaA light-weight, flexible, and expressive pandas data validation library
Stars: ✭ 506 (+932.65%)
NonechucksDeal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Stars: ✭ 304 (+520.41%)
ValidateProfessional data validation for the R environment
Stars: ✭ 268 (+446.94%)
Dirty catEncoding methods for dirty categorical variables
Stars: ✭ 259 (+428.57%)
covid-19-data-cleanupScripts to cleanup data from https://github.com/CSSEGISandData/COVID-19
Stars: ✭ 25 (-48.98%)
nepali-translatorNeural Machine Translation on the Nepali-English language pair
Stars: ✭ 29 (-40.82%)
allie🤖 A machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers).
Stars: ✭ 93 (+89.8%)
foofahFoofah: programming-by-example data transformation program synthesizer
Stars: ✭ 24 (-51.02%)
bumblebee🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
Stars: ✭ 120 (+144.9%)
errorlocateFind and replace erroneous fields in data using validation rules
Stars: ✭ 19 (-61.22%)
objectiv-analyticsPowerful product analytics for data teams, with full control over data & models.
Stars: ✭ 399 (+714.29%)
exemplary-ml-pipelineExemplary, annotated machine learning pipeline for any tabular data problem.
Stars: ✭ 23 (-53.06%)
Cleaner.jlA toolbox of simple solutions for common data cleaning problems.
Stars: ✭ 21 (-57.14%)
FIFA-2019-AnalysisThis is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (-42.86%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+2657.14%)
R-Learning-JourneySome of the projects i made when starting to learn R for Data Science at the university
Stars: ✭ 19 (-61.22%)
MillerMiller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Stars: ✭ 4,633 (+9355.1%)
Voicebook🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
Stars: ✭ 236 (+381.63%)
KlibEasy to use Python library of customized functions for cleaning and analyzing data.
Stars: ✭ 192 (+291.84%)
Machine Learning Workflow With PythonThis is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (+220.41%)
CleanlabThe standard package for machine learning with noisy labels, finding mislabeled data, and uncertainty quantification. Works with most datasets and models.
Stars: ✭ 2,526 (+5055.1%)
DatamaidAn R package for data screening
Stars: ✭ 120 (+144.9%)
Pandas VideosJupyter notebook and datasets from the pandas Q&A video series
Stars: ✭ 1,716 (+3402.04%)
Dat8General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+2993.88%)
RefinrCluster and merge similar char values: an R implementation of Open Refine clustering algorithms
Stars: ✭ 91 (+85.71%)
Bumblebee🚕 A spreadsheet-like data preparation web app that works over Optimus (pandas, dask, cuDF, dask-cuDF and PySpark)
Stars: ✭ 86 (+75.51%)