HazyResearch / reef

Licence: Apache-2.0 license

Automatically labeling training data

Programming Languages

Jupyter Notebook

11667 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to reef

Fullstackmachinelearning

Mostly free resources for end-to-end machine learning engineering, including open courses from CalTech, Columbia, Berkeley, MIT, and Stanford (in alphabetical order).

Stars: ✭ 39 (-61.76%)

Mutual labels: stanford

Journalism Syllabi

Computer-Assisted Reporting and Data Journalism Syllabuses, compiled by Dan Nguyen

Stars: ✭ 136 (+33.33%)

Mutual labels: stanford

Cs231a Notes

The course notes for Stanford's CS231A course on computer vision

Stars: ✭ 230 (+125.49%)

Mutual labels: stanford

Cs193p Ios9 Solutions

My solutions to the assignments for Stanford's CS193P: Developing iOS 9 Apps with Swift [Spring 2016]

Stars: ✭ 42 (-58.82%)

Mutual labels: stanford

Stanford Tensorflow Tutorials

This repository contains code examples for the Stanford's course: TensorFlow for Deep Learning Research.

Stars: ✭ 10,098 (+9800%)

Mutual labels: stanford

Cs193p Fall 2017

These are the lectures, slides, reading assignments, and problem sets for the Developing Apps for iOS 11 with Swift 4 CS193p course offered at the Stanford School of Engineering and available on iTunes U.

Stars: ✭ 141 (+38.24%)

Mutual labels: stanford

Stanford self driving car code

Stanford Code From Cars That Entered DARPA Grand Challenges

Stars: ✭ 687 (+573.53%)

Mutual labels: stanford

stanford-beamer-presentation

This is an unofficial LaTeX Beamer presentation template for Stanford University.

Stars: ✭ 47 (-53.92%)

Mutual labels: stanford

Cs193p 2020 Swiftui

📘 Stanford CS193p Spring 2020 - Developing Apps for iOS (SwiftUI)

Stars: ✭ 135 (+32.35%)

Mutual labels: stanford

Cs224n 2019

My completed implementation solutions for CS224N 2019

Stars: ✭ 178 (+74.51%)

Mutual labels: stanford

Actionroguelike

Third-person Action Roguelike made in Unreal Engine C++ (for Stanford CS193U 2020)

Stars: ✭ 1,121 (+999.02%)

Mutual labels: stanford

Pynlp

A pythonic wrapper for Stanford CoreNLP.

Stars: ✭ 103 (+0.98%)

Mutual labels: stanford

Stanford Cs229

Python solutions to the problem sets of Stanford's graduate course on Machine Learning, taught by Prof. Andrew Ng

Stars: ✭ 151 (+48.04%)

Mutual labels: stanford

Simple Cryptography

Scripts that illustrate basic cryptography concepts based on Coursera Standford Cryptography I course and more.

Stars: ✭ 40 (-60.78%)

Mutual labels: stanford

Weld

High-performance runtime for data analytics applications

Stars: ✭ 2,709 (+2555.88%)

Mutual labels: stanford

Stanford dbclass

Collection of my solutions to the (infamous) dbclass (2014 version) offered by Stanford.

Stars: ✭ 35 (-65.69%)

Mutual labels: stanford

Datasciencecoursera

Data Science Repo and blog for John Hopkins Coursera Courses. Please let me know if you have any questions.

Stars: ✭ 1,928 (+1790.2%)

Mutual labels: stanford

MCIS wsss

Code for ECCV 2020 paper (oral): Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation

Stars: ✭ 151 (+48.04%)

Mutual labels: weakly-supervised-learning

Stanford Cs231

Resources for students in the Udacity's Machine Learning Engineer Nanodegree to work through Stanford's Convolutional Neural Networks for Visual Recognition course (CS231n).

Stars: ✭ 249 (+144.12%)

Mutual labels: stanford

Cs253.stanford.edu

CS 253 Web Security course at Stanford University

Stars: ✭ 155 (+51.96%)

Mutual labels: stanford

View All Similar Projects ➔

Reef: Overcoming the Barrier to Labeling Training Data

Code for VLDB 2019 paper Snuba: Automating Weak Supervision to Label Training Data

Reef is an automated system for labeling training data based on a small labeled dataset. Reef utilizes ideas from program synthesis to automatically generate a set of interpretable heuristics that are then used to label unlabeled training data efficiently.

Installation

Reef uses Python 2. The Python package requirements are in the file requirements.txt. If you have Snorkel, can set a flag here as True but there is a simple version of learning heuristic accuracies in this repo as well.

Reef Workflow Overview

The inputs to Reef are the following:

A labeled dataset, which contains a numerical feature matrix and a vector of ground truth labels (currently only supports binary classification)
An unlabeled dataset, which contains a numerical feature matrix

The following is the overall workflow Reef follows to label training data automatically. The overall process is encoded in [1] generate_reef_labels.ipynb and the main file program_synthesis/heuristic_generator.py

Using the labeled dataset, Reef generates heuristics like decision trees, or small logistic regression models. The synthesis code is in program_synthesis/synthesizer.py.
1. A heuristic is generated for each possible combination of c features, where c is the cardinality. For example, with c=1 and 10 features, 10 heuristics will be generated.
2. For each generated heuristic, a beta parameter is calculated. This represents the minimum confidence level at which the heuristics will assign a label. This is done by maximizing the F1 score on the labeled dataset.
These heuristics are passed to a pruner that selects the best heuristic by maximizing a combination of the F1 score on the labeled dataset and diversity in terms of how many points it labels that previously selected heuristics don’t.
The selected heuristic and previously chosen heuristics are then passed to the verifier which learns accuracies for the heuristics based on the labels the heuristics assign to the unlabeled dataset.
Finally, Reef calculates the probabilistic labels the heuristics assign to the labeled dataset and pass datapoint with low confidence labels to the synthesizer. We repeat this procedure in an iterative manner.

Tutorial

The tutorial notebooks are based on a text-based plot classification dataset. We go through generating heuristics with Reef and then train a simple LSTM model to see how an end model trained with Reef labels compares to an end model trained with ground truth training labels.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

HazyResearch / reef

Programming Languages

Labels

Projects that are alternatives of or similar to reef

Reef: Overcoming the Barrier to Labeling Training Data

Installation

Reef Workflow Overview

Tutorial