All Projects → Great_expectations → Similar Projects or Alternatives

1500 Open source projects that are alternatives of or similar to Great_expectations

Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+43.41%)
Sweetviz
Visualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (-68.13%)
leila
Librería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Stars: ✭ 56 (-99.04%)
Applied Ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+206.89%)
datatile
A library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (-92.79%)
Mutual labels:  data-quality, data-profiling, mlops
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (-15.31%)
100 Days Of Ml Code
A day to day plan for this challenge. Covers both theoritical and practical aspects
Stars: ✭ 172 (-97.04%)
Data Describe
data⎰describe: Pythonic EDA Accelerator for Data Science
Stars: ✭ 269 (-95.37%)
Pipelinex
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-97.81%)
Complete Life Cycle Of A Data Science Project
Complete-Life-Cycle-of-a-Data-Science-Project
Stars: ✭ 140 (-97.59%)
Dataprep
DataPrep — The easiest way to prepare data in Python
Stars: ✭ 639 (-89%)
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-98.64%)
Xda
R package for exploratory data analysis
Stars: ✭ 112 (-98.07%)
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (-73.98%)
Mutual labels:  data-science, data-engineering
D6t Python
Accelerate data science
Stars: ✭ 118 (-97.97%)
Mutual labels:  data-science, data-engineering
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (-58.94%)
Mutual labels:  data-science, data-engineering
Steppy
Lightweight, Python library for fast and reproducible experimentation 🔬
Stars: ✭ 119 (-97.95%)
Mutual labels:  data-science, pipeline
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (-97.83%)
Mutual labels:  data-science, data-engineering
Dataexplorer
Automate Data Exploration and Treatment
Stars: ✭ 362 (-93.77%)
Mutual labels:  data-science, eda
Batchflow
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Stars: ✭ 156 (-97.31%)
Mutual labels:  data-science, pipeline
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-97.38%)
Mutual labels:  data-science, data-engineering
Learn Something Every Day
📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (-93.77%)
Mutual labels:  data-science, data-engineering
Soda Sql
Metric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (-97.02%)
Mutual labels:  data-science, data-engineering
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-97.64%)
Mutual labels:  data-science, data-engineering
Auptimizer
An automatic ML model optimization tool.
Stars: ✭ 166 (-97.14%)
Mutual labels:  data-science, data-engineering
Lightautoml
LAMA - automatic model creation framework
Stars: ✭ 196 (-96.63%)
Mutual labels:  data-science, pipeline
Gspread Pandas
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (-96.11%)
Mutual labels:  data-science, data-engineering
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-98.12%)
Chain.jl
A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.
Stars: ✭ 118 (-97.97%)
Mutual labels:  data-science, pipeline
Superset
Apache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+634.06%)
Mutual labels:  data-science, data-engineering
Open Solution Salt Identification
Open solution to the TGS Salt Identification Challenge
Stars: ✭ 124 (-97.87%)
Mutual labels:  data-science, pipeline
Spark Alchemy
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-97.9%)
Mutual labels:  data-science, data-engineering
contessa
Easy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-99.71%)
Mutual labels:  data-engineering, data-quality
Blurr
Data transformations for the ML era
Stars: ✭ 96 (-98.35%)
Mutual labels:  data-science, pipeline
Open Solution Toxic Comments
Open solution to the Toxic Comment Classification Challenge
Stars: ✭ 154 (-97.35%)
Mutual labels:  data-science, pipeline
Bodywork Core
Deploy machine learning projects developed in Python, to Kubernetes. Accelerated MLOps 🚀
Stars: ✭ 145 (-97.5%)
Mutual labels:  data-science, pipeline
Kedro
A Python framework for creating reproducible, maintainable and modular data science code.
Stars: ✭ 4,764 (-17.98%)
Mutual labels:  pipeline, mlops
dqlab-career-track
A collection of scripts written to complete DQLab Data Analyst Career Track 📊
Stars: ✭ 53 (-99.09%)
Sparkora
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (-99.12%)
Mutual labels:  exploratory-data-analysis, eda
traceml
Engine for ML/Data tracking, visualization, dashboards, and model UI for Polyaxon.
Stars: ✭ 445 (-92.34%)
Mutual labels:  data-profiling, mlops
Automlpipeline.jl
A package that makes it trivial to create and evaluate machine learning pipeline architectures.
Stars: ✭ 223 (-96.16%)
Mutual labels:  data-science, pipeline
krsh
A declarative KubeFlow Management Tool
Stars: ✭ 127 (-97.81%)
Mutual labels:  pipeline, mlops
skimpy
skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
Stars: ✭ 236 (-95.94%)
Mutual labels:  exploratory-data-analysis, eda
Exploratory Data Analysis Visualization Python
Data analysis and visualization with PyData ecosystem: Pandas, Matplotlib Numpy, and Seaborn
Stars: ✭ 78 (-98.66%)
Mutual labels:  exploratory-data-analysis, eda
loon
A Toolkit for Interactive Statistical Data Visualization
Stars: ✭ 45 (-99.23%)
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-99%)
Mutual labels:  data-engineering, data-quality
great expectations action
A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
Stars: ✭ 66 (-98.86%)
Mutual labels:  data-quality, mlops
bodywork-ml-pipeline-project
Deployment template for a continuous training pipeline.
Stars: ✭ 22 (-99.62%)
Mutual labels:  pipeline, mlops
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (-97.52%)
Mutual labels:  data-engineering, data-quality
popmon
Monitor the stability of a Pandas or Spark dataframe ⚙︎
Stars: ✭ 434 (-92.53%)
Mutual labels:  data-profiling, mlops
beneath
Beneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-98.88%)
Mutual labels:  data-engineering, mlops
kedro
A Python framework for creating reproducible, maintainable and modular data science code.
Stars: ✭ 6,068 (+4.48%)
Mutual labels:  pipeline, mlops
Polyaxon
Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (-48.93%)
Mutual labels:  data-science, mlops
olliePy
OlliePy is a python package which can help data scientists in exploring their data and evaluating and analysing their machine learning experiments by utilising the power and structure of modern web applications. The data scientist only needs to provide the data and any required information and OlliePy will generate the rest.
Stars: ✭ 46 (-99.21%)
Mutual labels:  exploratory-data-analysis, eda
Hub
Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (-31.08%)
Mutual labels:  data-science, mlops
Autoeda Resources
A list of software and papers related to automatic and fast Exploratory Data Analysis
Stars: ✭ 268 (-95.39%)
Mutual labels:  exploratory-data-analysis, eda
Open Solution Mapping Challenge
Open solution to the Mapping Challenge 🌎
Stars: ✭ 291 (-94.99%)
Mutual labels:  data-science, pipeline
Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-98.52%)
Drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (-77.6%)
Mutual labels:  data-science, pipeline
Ploomber
A convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.
Stars: ✭ 221 (-96.19%)
Mutual labels:  data-science, data-engineering
1-60 of 1500 similar projects