All Projects → sdv-dev → SDMetrics

sdv-dev / SDMetrics

Licence: MIT license
Metrics to evaluate quality and efficacy of synthetic datasets.

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to SDMetrics

SDGym
Benchmarking synthetic data generation methods.
Stars: ✭ 177 (+164.18%)
Mutual labels:  synthetic-data
elm-review
Analyzes Elm projects, to help find mistakes before your users find them.
Stars: ✭ 195 (+191.04%)
Mutual labels:  quality
Three-Filters-to-Normal
Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator (RAL+ICRA'21)
Stars: ✭ 41 (-38.81%)
Mutual labels:  synthetic-data
DeepEcho
Synthetic Data Generation for mixed-type, multivariate time series.
Stars: ✭ 44 (-34.33%)
Mutual labels:  synthetic-data
table-evaluator
Evaluate real and synthetic datasets with each other
Stars: ✭ 44 (-34.33%)
Mutual labels:  synthetic-data
quality-requirements
Beispiele für Qualitätsanforderungen an Software (etwa: Zur Vereinfachung von ATAM-Analysen oder Quality-Driven Software Architecture)
Stars: ✭ 61 (-8.96%)
Mutual labels:  quality
sonar-scala
A free and open-source SonarQube plugin for static code analysis of Scala projects.
Stars: ✭ 113 (+68.66%)
Mutual labels:  quality
augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Stars: ✭ 49 (-26.87%)
Mutual labels:  synthetic-data
VisDA2020
VisDA2020: 4th Visual Domain Adaptation Challenge in ECCV'20
Stars: ✭ 53 (-20.9%)
Mutual labels:  synthetic-data
Clustering-Datasets
This repository contains the collection of UCI (real-life) datasets and Synthetic (artificial) datasets (with cluster labels and MATLAB files) ready to use with clustering algorithms.
Stars: ✭ 189 (+182.09%)
Mutual labels:  synthetic-data
zpy
Synthetic data for computer vision. An open source toolkit using Blender and Python.
Stars: ✭ 251 (+274.63%)
Mutual labels:  synthetic-data
ESP8266-HomeKit-Air-Quality-Sensor-Elgato-Eve-Room
ESP8266 based  Homekit Indoor Air Quality sensor that acts like Eve Room🌱
Stars: ✭ 58 (-13.43%)
Mutual labels:  quality
symfony-lts-docker-starter
🐳 Dockerized your Symfony project using a complete stack (Makefile, Docker-Compose, CI, bunch of quality insurance tools, tests ...) with a base according to up-to-date components and best practices.
Stars: ✭ 39 (-41.79%)
Mutual labels:  quality
NaturalGroundingPlayer
Sequence videos based on their energy readings
Stars: ✭ 46 (-31.34%)
Mutual labels:  quality
game-feature-learning
Code for paper "Cross-Domain Self-supervised Multi-task Feature Learning using Synthetic Imagery", Ren et al., CVPR'18
Stars: ✭ 68 (+1.49%)
Mutual labels:  synthetic-data
sonarlint4netbeans
SonarLint integration for Apache Netbeans
Stars: ✭ 23 (-65.67%)
Mutual labels:  quality
Robotics-Object-Pose-Estimation
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.
Stars: ✭ 153 (+128.36%)
Mutual labels:  synthetic-data
codeclimate-phpcodesniffer
Code Climate Engine for PHP Code Sniffer
Stars: ✭ 27 (-59.7%)
Mutual labels:  quality
uoais
Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling", ICRA 2022
Stars: ✭ 77 (+14.93%)
Mutual labels:  synthetic-data
python-test-reporter
DEPRECATED Uploads Python test coverage data to Code Climate
Stars: ✭ 18 (-73.13%)
Mutual labels:  quality

This repository is part of The Synthetic Data Vault Project, a project from DataCebo.

Development Status PyPI Shield Downloads Tests Coverage Status

Overview

The SDMetrics library provides a set of dataset-agnostic tools for evaluating the quality of a synthetic database by comparing it to the real database that it is modeled after.

Important Links
💻 Website Check out the SDV Website for more information about the project.
📙 SDV Blog Regular publshing of useful content about Synthetic Data Generation.
📖 Documentation Quickstarts, User and Development Guides, and API Reference.
:octocat: Repository The link to the Github Repository of this library.
📜 License The entire ecosystem is published under the MIT License.
⌨️ Development Status This software is in its Pre-Alpha stage.
Community Join our Slack Workspace for announcements and discussions.
Tutorials Run the SDV Tutorials in a Binder environment.

Features

It supports multiple data modalities:

  • Single Columns: Compare 1 dimensional numpy arrays representing individual columns.
  • Column Pairs: Compare how columns in a pandas.DataFrame relate to each other, in groups of 2.
  • Single Table: Compare an entire table, represented as a pandas.DataFrame.
  • Multi Table: Compare multi-table and relational datasets represented as a python dict with multiple tables passed as pandas.DataFrames.
  • Time Series: Compare tables representing ordered sequences of events.

It includes a variety of metrics such as:

  • Statistical metrics which use statistical tests to compare the distributions of the real and synthetic distributions.
  • Detection metrics which use machine learning to try to distinguish between real and synthetic data.
  • Efficacy metrics which compare the performance of machine learning models when run on the synthetic and real data.
  • Bayesian Network and Gaussian Mixture metrics which learn the distribution of the real data and evaluate the likelihood of the synthetic data belonging to the learned distribution.
  • Privacy metrics which evaluate whether the synthetic data is leaking information about the real data.

Install

SDMetrics is part of the SDV project and is automatically installed alongside it. For details about this process please visit the SDV Installation Guide

Optionally, SDMetrics can also be installed as a standalone library using the following commands:

Using pip:

pip install sdmetrics

Using conda:

conda install -c conda-forge -c pytorch sdmetrics

For more installation options please visit the SDMetrics installation Guide

Usage

SDMetrics is included as part of the framework offered by SDV to evaluate the quality of your synthetic dataset. For more details about how to use it please visit the corresponding User Guide:

Standalone usage

SDMetrics can also be used as a standalone library to run metrics individually.

In this short example we show how to use it to evaluate a toy multi-table dataset and its synthetic replica by running all the compatible multi-table metrics on it:

import sdmetrics

# Load the demo data, which includes:
# - A dict containing the real tables as pandas.DataFrames.
# - A dict containing the synthetic clones of the real data.
# - A dict containing metadata about the tables.
real_data, synthetic_data, metadata = sdmetrics.load_demo()

# Obtain the list of multi table metrics, which is returned as a dict
# containing the metric names and the corresponding metric classes.
metrics = sdmetrics.multi_table.MultiTableMetric.get_subclasses()

# Run all the compatible metrics and get a report
sdmetrics.compute_metrics(metrics, real_data, synthetic_data, metadata=metadata)

The output will be a table with all the details about the executed metrics and their score:

metric name score min_value max_value goal
CSTest Chi-Squared 0.76651 0 1 MAXIMIZE
KSTest Inverted Kolmogorov-Smirnov D statistic 0.75 0 1 MAXIMIZE
KSTestExtended Inverted Kolmogorov-Smirnov D statistic 0.777778 0 1 MAXIMIZE
LogisticDetection LogisticRegression Detection 0.882716 0 1 MAXIMIZE
SVCDetection SVC Detection 0.833333 0 1 MAXIMIZE
BNLikelihood BayesianNetwork Likelihood nan 0 1 MAXIMIZE
BNLogLikelihood BayesianNetwork Log Likelihood nan -inf 0 MAXIMIZE
LogisticParentChildDetection LogisticRegression Detection 0.619444 0 1 MAXIMIZE
SVCParentChildDetection SVC Detection 0.916667 0 1 MAXIMIZE

What's next?

If you want to read more about each individual metric, please visit the following folders:




The Synthetic Data Vault Project was first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project. Today, DataCebo is the proud developer of SDV, the largest ecosystem for synthetic data generation & evaluation. It is home to multiple libraries that support synthetic data, including:

  • 🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
  • 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular, multi table and time series data.
  • 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data generation models.

Get started using the SDV package -- a fully integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries for specific needs.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].