All Projects → frederick0329 → TracIn

frederick0329 / TracIn

Licence: Apache-2.0 license
Implementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to TracIn

Applied Ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+10702.42%)
Mutual labels:  data-quality
penguin-datalayer-collect
A data layer quality monitoring and validation module, this solution is part of the Raft Suite ecosystem.
Stars: ✭ 19 (-88.48%)
Mutual labels:  data-quality
osm-data-classification
Migrated to: https://gitlab.com/Oslandia/osm-data-classification
Stars: ✭ 23 (-86.06%)
Mutual labels:  data-quality
Real-Time-Abnormal-Events-Detection-and-Tracking-in-Surveillance-System
The main abnormal behaviors that this project can detect are: Violence, covering camera, Choking, lying down, Running, Motion in restricted areas. It provides much flexibility by allowing users to choose the abnormal behaviors they want to be detected and keeps track of every abnormal event to be reviewed. We used three methods to detect abnorma…
Stars: ✭ 35 (-78.79%)
Mutual labels:  influence
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-64.85%)
Mutual labels:  data-quality
great expectations action
A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.
Stars: ✭ 66 (-60%)
Mutual labels:  data-quality
Great expectations
Always know what to expect from your data.
Stars: ✭ 5,808 (+3420%)
Mutual labels:  data-quality
Data-Quality-Analysis
The PEDSnet Data Quality Assessment Toolkit (OMOP CDM)
Stars: ✭ 19 (-88.48%)
Mutual labels:  data-quality
contessa
Easy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-89.7%)
Mutual labels:  data-quality
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (-12.73%)
Mutual labels:  data-quality
hive compared bq
hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
Stars: ✭ 27 (-83.64%)
Mutual labels:  data-quality
NBi
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…
Stars: ✭ 102 (-38.18%)
Mutual labels:  data-quality
popular-github-template
📗 Repo Template: Make Your GitHub Repos More Popular
Stars: ✭ 16 (-90.3%)
Mutual labels:  influence
roguelike-universe
Understanding game design inspiration of roguelike games via web scraping and network analysis.
Stars: ✭ 17 (-89.7%)
Mutual labels:  influence
check-engine
Data validation library for PySpark 3.0.0
Stars: ✭ 29 (-82.42%)
Mutual labels:  data-quality
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+4947.88%)
Mutual labels:  data-quality
dqlab-career-track
A collection of scripts written to complete DQLab Data Analyst Career Track 📊
Stars: ✭ 53 (-67.88%)
Mutual labels:  data-quality
hooqu
hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ to Python
Stars: ✭ 17 (-89.7%)
Mutual labels:  data-quality
datatile
A library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+153.94%)
Mutual labels:  data-quality
leila
Librería para la evaluación de calidad de datos, e interacción con el portal de datos.gov.co
Stars: ✭ 56 (-66.06%)
Mutual labels:  data-quality

TracIn

Implementation of Estimating Training Data Influence by Tracing Gradient Descent

Goal: Identify the influence of training data points on F(data point at inference time).

Idea: Trace Stochastic Gradient Descent (Using the loss function as F)

Equation

Broader Impact

This work proposes a practical technique to understand the influence of training data points on loss functions/predictions/differentiable metrics. The technique is easier to apply than previously proposed techniques, and we hope it is widely used to understand the quality and influence of training data. For most real world applications, the impact of improving the quality of training data is simply to improve the quality of the model. In this sense, we expect the broader impact to be positive.

Most of the implementation in this repo will be in the form of colabs. Consider reading the FAQ before adapting to your own data.

Terminology

  • Proponents have positive scores proportional to loss reduction.
  • Opponents have negative scores proportional to loss enlargement.
  • Self-influence is the influence of a training point on its own loss.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].