All Projects → deepchecks → deepchecks

deepchecks / deepchecks

Licence: other
Test Suites for Validating ML Models & Data. Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to deepchecks

Feast
Feature Store for Machine Learning
Stars: ✭ 2,576 (+61.5%)
Mutual labels:  ml, mlops
Pandera
A light-weight, flexible, and expressive pandas data validation library
Stars: ✭ 506 (-68.28%)
Mutual labels:  data-validation, pandas-dataframe
Bentoml
Model Serving Made Easy
Stars: ✭ 3,064 (+92.1%)
Mutual labels:  ml, mlops
Hub
Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+150.97%)
Mutual labels:  ml, mlops
cli
Polyaxon Core Client & CLI to streamline MLOps
Stars: ✭ 18 (-98.87%)
Mutual labels:  ml, mlops
Metaflow
🚀 Build and manage real-life data science projects with ease!
Stars: ✭ 5,108 (+220.25%)
Mutual labels:  ml, mlops
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+422.19%)
Mutual labels:  pandas-dataframe, html-report
Evidently
Interactive reports to analyze machine learning models during validation or production monitoring.
Stars: ✭ 304 (-80.94%)
Mutual labels:  pandas-dataframe, html-report
VickyBytes
Subscribe to this GitHub repo to access the latest tech talks, tech demos, learning materials & modules, and developer community updates!
Stars: ✭ 48 (-96.99%)
Mutual labels:  ml, mlops
metaflowbot
Slack bot for monitoring your Metaflow flows!
Stars: ✭ 22 (-98.62%)
Mutual labels:  ml, mlops
Polyaxon
Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (+85.96%)
Mutual labels:  ml, mlops
oomstore
Lightweight and Fast Feature Store Powered by Go (and Rust).
Stars: ✭ 76 (-95.24%)
Mutual labels:  ml, mlops
vertex-ai-samples
Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud
Stars: ✭ 270 (-83.07%)
Mutual labels:  ml, mlops
Awesome Mlops
A curated list of references for MLOps
Stars: ✭ 7,119 (+346.33%)
Mutual labels:  ml, mlops
hi-ml
HI-ML toolbox for deep learning for medical imaging and Azure integration
Stars: ✭ 150 (-90.6%)
Mutual labels:  ml, mlops
neptune-client
📒 Experiment tracking tool and model registry
Stars: ✭ 348 (-78.18%)
Mutual labels:  ml, mlops
objectiv-analytics
Powerful product analytics for data teams, with full control over data & models.
Stars: ✭ 399 (-74.98%)
Mutual labels:  data-validation, pandas-dataframe
NimbusML-Samples
Samples for NimbusML, a Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.
Stars: ✭ 31 (-98.06%)
Mutual labels:  ml
informatica-public
Public code developed during my MSc study at University of Bologna
Stars: ✭ 79 (-95.05%)
Mutual labels:  ml
aws-ai-ml-workshop-kr
A collection of localized (Korean) AWS AI/ML workshop materials for hands-on labs.
Stars: ✭ 65 (-95.92%)
Mutual labels:  ml

Join Slack   |   Documentation   |   Blog   |   Twitter

build Documentation Status pkgVersion pyVersions Maintainability Coverage Status

Testing and Validating ML Models & Data

🧐 What is Deepchecks?

Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort. This includes checks related to various types of issues, such as model performance, data integrity, distribution mismatches, and more.

🖼️ Computer Vision & 🔢 Tabular Support

This README refers to the Tabular version of deepchecks.

Check out the Deepchecks for Computer Vision & Images subpackage for more details about deepchecks for CV, currently in beta release.

💻 Installation

Using pip

pip install deepchecks -U --user

Note: Computer Vision Install

To install deepchecks together with the Computer Vision Submodule that is currently in beta release, replace deepchecks with "deepchecks[vision]" as follows.

pip install "deepchecks[vision]" -U --user

Using conda

conda install -c conda-forge deepchecks

Try it Out!

🏃‍♀️ See It in Action

Head over to one of our following quickstart tutorials, and have deepchecks running on your environment in less than 5 min:

Recommended - download the code and run it locally on the built-in dataset and (optional) model, or replace them with your own.

🚀 See Our Checks Demo

Play with some of the existing checks in our Interactive Checks Demo, and see how they work on various datasets with custom corruptions injected.

📊 Usage Examples

Running a Suite

A Suite runs a collection of Checks with optional Conditions added to them.

Example for running a suite on given datasets and with a supported model:

from deepchecks.tabular.suites import model_evaluation
suite = model_evaluation()
result = suite.run(train_dataset=train_dataset, test_dataset=test_dataset, model=model)
result.show()

Which will result in a report that looks like this:

Note:

  • Results can also be saved as an html report, saved as json, or exported to other tools (e.g Weights & Biases - wandb)
  • Other suites that run only on the data (data_integrity, train_test_validation) don't require a model as part of the input.

See the full code tutorials here.

In the following section you can see an example of how the output of a single check without a condition may look.

Running a Check

To run a specific single check, all you need to do is import it and then to run it with the required (check-dependent) input parameters. More details about the existing checks and the parameters they can receive can be found in our API Reference.

from deepchecks.tabular.checks import TrainTestFeatureDrift
import pandas as pd

train_df = pd.read_csv('train_data.csv')
test_df = pd.read_csv('test_data.csv')
# Initialize and run desired check
TrainTestFeatureDrift().run(train_df, test_df)

Will produce output of the type:

Train Test Drift

The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the features, sorted by feature importance and showing only the top 5 features, according to feature importance. If available, the plot titles also show the feature importance (FI) rank.

🙋🏼 When Should You Use Deepchecks?

While you’re in the research phase, and want to validate your data, find potential methodological problems, and/or validate your model and evaluate it.

See more about typical usage scenarios and the built-in suites in the docs.

🗝️ Key Concepts

Check

Each check enables you to inspect a specific aspect of your data and models. They are the basic building block of the deepchecks package, covering all kinds of common issues, such as:

  • Model Error Analysis
  • Label Ambiguity
  • Data Sample Leakage

and many more checks.

Each check can have two types of results:

  1. A visual result meant for display (e.g. a figure or a table).
  2. A return value that can be used for validating the expected check results (validations are typically done by adding a "condition" to the check, as explained below).

Condition

A condition is a function that can be added to a Check, which returns a pass ✓, fail or warning ! result, intended for validating the Check's return value. An example for adding a condition would be:

from deepchecks.tabular.checks import BoostingOverfit
BoostingOverfit().add_condition_test_score_percent_decline_not_greater_than(threshold=0.05)

which will return a check failure when running it if there is a difference of more than 5% between the best score achieved on the test set during the boosting iterations and the score achieved in the last iteration (the model's "original" score on the test set).

Suite

An ordered collection of checks, that can have conditions added to them. The Suite enables displaying a concluding report for all of the Checks that ran.

See the list of predefined existing suites for tabular data to learn more about the suites you can work with directly and also to see a code example demonstrating how to build your own custom suite.

The existing suites include default conditions added for most of the checks. You can edit the preconfigured suites or build a suite of your own with a collection of checks and optional conditions.

🤔 What Do You Need in Order to Start Validating?

Environment

  • The deepchecks package installed
  • JupyterLab or Jupyter Notebook or any Python IDE

Data / Model

Depending on your phase and what you wish to validate, you'll need a subset of the following:

  • Raw data (before pre-processing such as OHE, string processing, etc.), with optional labels
  • The model's training data with labels
  • Test data (which the model isn't exposed to) with labels
  • A supported model (e.g. scikit-learn models, XGBoost, any model implementing the predict method in the required format)

Supported Data Types

The package currently supports tabular data and is in beta release for the Computer Vision subpackage.

📖 Documentation

👭 Community

  • Join our Slack Community to connect with the maintainers and follow users and interesting discussions
  • Post a Github Issue to suggest improvements, open an issue, or share feedback.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].