All Projects → decile-team → spear

decile-team / spear

Licence: MIT license
SPEAR: Programmatically label and build training data quickly.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to spear

Cleanlab
The standard package for machine learning with noisy labels, finding mislabeled data, and uncertainty quantification. Works with most datasets and models.
Stars: ✭ 2,526 (+3018.52%)
Mutual labels:  weak-supervision, semi-supervised-learning, unsupervised-learning
catgan pytorch
Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks
Stars: ✭ 50 (-38.27%)
Mutual labels:  semi-supervised-learning, unsupervised-learning
Susi
SuSi: Python package for unsupervised, supervised and semi-supervised self-organizing maps (SOM)
Stars: ✭ 42 (-48.15%)
Mutual labels:  semi-supervised-learning, unsupervised-learning
Alibi Detect
Algorithms for outlier and adversarial instance detection, concept drift and metrics.
Stars: ✭ 604 (+645.68%)
Mutual labels:  semi-supervised-learning, unsupervised-learning
metric-transfer.pytorch
Deep Metric Transfer for Label Propagation with Limited Annotated Data
Stars: ✭ 49 (-39.51%)
Mutual labels:  semi-supervised-learning, unsupervised-learning
Awesome-Weak-Supervision
A curated list of programmatic weak supervision papers and resources
Stars: ✭ 77 (-4.94%)
Mutual labels:  weak-supervision, data-programming
L2c
Learning to Cluster. A deep clustering strategy.
Stars: ✭ 262 (+223.46%)
Mutual labels:  semi-supervised-learning, unsupervised-learning
Self Supervised Speech Recognition
speech to text with self-supervised learning based on wav2vec 2.0 framework
Stars: ✭ 106 (+30.86%)
Mutual labels:  semi-supervised-learning, unsupervised-learning
wrench
WRENCH: Weak supeRvision bENCHmark
Stars: ✭ 185 (+128.4%)
Mutual labels:  weak-supervision, data-programming
deepOF
TensorFlow implementation for "Guided Optical Flow Learning"
Stars: ✭ 26 (-67.9%)
Mutual labels:  semi-supervised-learning, unsupervised-learning
RolX
An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)
Stars: ✭ 52 (-35.8%)
Mutual labels:  unsupervised-learning
semantic-parsing-dual
Source code and data for ACL 2019 Long Paper ``Semantic Parsing with Dual Learning".
Stars: ✭ 17 (-79.01%)
Mutual labels:  semi-supervised-learning
machine-learning
Programming Assignments and Lectures for Andrew Ng's "Machine Learning" Coursera course
Stars: ✭ 83 (+2.47%)
Mutual labels:  unsupervised-learning
T-CorEx
Implementation of linear CorEx and temporal CorEx.
Stars: ✭ 31 (-61.73%)
Mutual labels:  unsupervised-learning
Video-Summarization-Pytorch
IMPLEMENT AAAI 2018 - Unsupervised video summarization with deep reinforcement learning (PyTorch)
Stars: ✭ 23 (-71.6%)
Mutual labels:  unsupervised-learning
ganbert-pytorch
Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace
Stars: ✭ 60 (-25.93%)
Mutual labels:  semi-supervised-learning
proto
Proto-RL: Reinforcement Learning with Prototypical Representations
Stars: ✭ 67 (-17.28%)
Mutual labels:  unsupervised-learning
JCLAL
JCLAL is a general purpose framework developed in Java for Active Learning.
Stars: ✭ 22 (-72.84%)
Mutual labels:  semi-supervised-learning
NTFk.jl
Unsupervised Machine Learning: Nonnegative Tensor Factorization + k-means clustering
Stars: ✭ 36 (-55.56%)
Mutual labels:  unsupervised-learning
semi-supervised-NFs
Code for the paper Semi-Conditional Normalizing Flows for Semi-Supervised Learning
Stars: ✭ 23 (-71.6%)
Mutual labels:  semi-supervised-learning

Lines of code PyPI docs license website



Semi-Supervised Data Programming for Data Efficient Machine Learning

SPEAR is a library for data programming with semi-supervision. The package implements several recent data programming approaches including facility to programmatically label and build training data.

Pipeline

  • Design Labeling functions(LFs)
  • generate pickle file containing labels by passing raw data to LFs
  • Use one of the Label Aggregators(LA) to get final labels



SPEAR provides functionality such as

  • development of LFs/rules/heuristics for quick labeling
  • compare against several data programming approaches
  • compare against semi-supervised data programming approaches
  • use subset selection to make best use of the annotation efforts
  • facility to store and save data in pickle file

Labelling Functions (LFs)

  • discrete LFs - Users can define LFs that return discrete labels
  • continuous LFs - return continuous scores/confidence to the labels assigned

Approaches Implemented

You can read this paper to know about below approaches

  • Only-L
  • Learning to Reweight
  • Posterior Regularization
  • Imply Loss
  • CAGE
  • Joint Learning

Data folder for SMS & TREC can be found here. This folder needs to be placed in the same directory as notebooks folder is in, to run the notebooks or examples.

Direct download of the zip file can be done via wget using gdown library .

pip install gdown
gdown 1CJZ73nNa7Ho0BOSDgGx9CRvXoepVSpet

Installation

  • Install Submodlib library pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ submodlib

Method 1

To install latest version of SPEAR package using PyPI:

pip install decile-spear

Method 2

SPEAR requires Python 3.6 or later. First install submodlib. Then install SPEAR:

git clone https://github.com/decile-team/spear.git
cd spear
pip install -r requirements/requirements.txt

Citation

@misc{abhishek2021spear,
      title={SPEAR : Semi-supervised Data Programming in Python}, 
      author={Guttu Sai Abhishek and Harshad Ingole and Parth Laturia and Vineeth Dorna and Ayush Maheshwari and Ganesh Ramakrishnan and Rishabh Iyer},
      year={2021},
      eprint={2108.00373},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Quick Links

Acknowledgment

SPEAR takes inspiration, builds upon, and uses pieces of code from several open source codebases. These include Snorkel, Snuba & Imply Loss. Also, SPEAR uses SUBMODLIB for subset selection, which is provided by DECILE too.

Team

SPEAR is created and maintained by Ayush, Abhishek, Vineeth, Harshad, Parth, Pankaj, Rishabh Iyer, and Ganesh Ramakrishnan. We look forward to have SPEAR more community driven. Please use it and contribute to it for your research, and feel free to use it for your commercial projects. We will add the major contributors here.

Publications

[1] Maheshwari, Ayush, et al. Data Programming using Semi-Supervision and Subset Selection, In Findings of ACL (Long Paper) 2021.

[2] Chatterjee, Oishik, Ganesh Ramakrishnan, and Sunita Sarawagi. Data Programming using Continuous and Quality-Guided Labeling Functions, In AAAI 2020.

[3] Sahay, Atul, et al. Rule augmented unsupervised constituency parsing, In Findings of ACL (Short Paper) 2021.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].