Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+43.41%)
SweetvizVisualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (-68.13%)
leilaLibrería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Stars: ✭ 56 (-99.04%)
Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+206.89%)
datatileA library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (-92.79%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (-15.31%)
100 Days Of Ml CodeA day to day plan for this challenge. Covers both theoritical and practical aspects
Stars: ✭ 172 (-97.04%)
Data Describedata⎰describe: Pythonic EDA Accelerator for Data Science
Stars: ✭ 269 (-95.37%)
PipelinexPipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-97.81%)
DataprepDataPrep — The easiest way to prepare data in Python
Stars: ✭ 639 (-89%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-98.64%)
XdaR package for exploratory data analysis
Stars: ✭ 112 (-98.07%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (-73.98%)
D6t PythonAccelerate data science
Stars: ✭ 118 (-97.97%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (-58.94%)
SteppyLightweight, Python library for fast and reproducible experimentation 🔬
Stars: ✭ 119 (-97.95%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-97.83%)
DataexplorerAutomate Data Exploration and Treatment
Stars: ✭ 362 (-93.77%)
BatchflowBatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Stars: ✭ 156 (-97.31%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-97.38%)
Learn Something Every Day📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (-93.77%)
Soda SqlMetric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (-97.02%)
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-97.64%)
AuptimizerAn automatic ML model optimization tool.
Stars: ✭ 166 (-97.14%)
LightautomlLAMA - automatic model creation framework
Stars: ✭ 196 (-96.63%)
Gspread PandasA package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (-96.11%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-98.12%)
Chain.jlA Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.
Stars: ✭ 118 (-97.97%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+634.06%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-97.9%)
contessaEasy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-99.71%)
BlurrData transformations for the ML era
Stars: ✭ 96 (-98.35%)
Bodywork CoreDeploy machine learning projects developed in Python, to Kubernetes. Accelerated MLOps 🚀
Stars: ✭ 145 (-97.5%)
KedroA Python framework for creating reproducible, maintainable and modular data science code.
Stars: ✭ 4,764 (-17.98%)
dqlab-career-trackA collection of scripts written to complete DQLab Data Analyst Career Track 📊
Stars: ✭ 53 (-99.09%)
SparkoraPowerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (-99.12%)
tracemlEngine for ML/Data tracking, visualization, dashboards, and model UI for Polyaxon.
Stars: ✭ 445 (-92.34%)
Automlpipeline.jlA package that makes it trivial to create and evaluate machine learning pipeline architectures.
Stars: ✭ 223 (-96.16%)
krshA declarative KubeFlow Management Tool
Stars: ✭ 127 (-97.81%)
skimpyskimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
Stars: ✭ 236 (-95.94%)
loonA Toolkit for Interactive Statistical Data Visualization
Stars: ✭ 45 (-99.23%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-99%)
great expectations actionA GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
Stars: ✭ 66 (-98.86%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (-97.52%)
popmonMonitor the stability of a Pandas or Spark dataframe ⚙︎
Stars: ✭ 434 (-92.53%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-98.88%)
kedroA Python framework for creating reproducible, maintainable and modular data science code.
Stars: ✭ 6,068 (+4.48%)
PolyaxonMachine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (-48.93%)
olliePyOlliePy is a python package which can help data scientists in exploring their data and evaluating and analysing their machine learning experiments by utilising the power and structure of modern web applications. The data scientist only needs to provide the data and any required information and OlliePy will generate the rest.
Stars: ✭ 46 (-99.21%)
HubDataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (-31.08%)
Autoeda ResourcesA list of software and papers related to automatic and fast Exploratory Data Analysis
Stars: ✭ 268 (-95.39%)
Kaggle CompetitionsThere are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-98.52%)
DrakeAn R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (-77.6%)
PloomberA convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.
Stars: ✭ 221 (-96.19%)