All Projects → awasthiabhijeet → Learning-From-Rules

awasthiabhijeet / Learning-From-Rules

Licence: Apache-2.0 license
Implementation of experiments in paper "Learning from Rules Generalizing Labeled Exemplars" to appear in ICLR2020 (https://openreview.net/forum?id=SkeuexBtDr)

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects
perl
6916 projects

Projects that are alternatives of or similar to Learning-From-Rules

weasel
Weakly Supervised End-to-End Learning (NeurIPS 2021)
Stars: ✭ 117 (+154.35%)
Mutual labels:  weak-supervision, weakly-supervised-learning
concept-based-xai
Library implementing state-of-the-art Concept-based and Disentanglement Learning methods for Explainable AI
Stars: ✭ 41 (-10.87%)
Mutual labels:  weak-supervision, weakly-supervised-learning
awesome-graph-self-supervised-learning
Awesome Graph Self-Supervised Learning
Stars: ✭ 805 (+1650%)
Mutual labels:  representation-learning, data-augmentation
Snorkel
A system for quickly generating training data with weak supervision
Stars: ✭ 4,953 (+10667.39%)
Mutual labels:  weak-supervision, data-augmentation
ASTRA
Self-training with Weak Supervision (NAACL 2021)
Stars: ✭ 127 (+176.09%)
Mutual labels:  weak-supervision, weakly-supervised-learning
trove
Weakly supervised medical named entity classification
Stars: ✭ 55 (+19.57%)
Mutual labels:  weak-supervision, weakly-supervised-learning
Stylealign
[ICCV 2019]Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Transition
Stars: ✭ 172 (+273.91%)
Mutual labels:  representation-learning, data-augmentation
knodle
A PyTorch-based open-source framework that provides methods for improving the weakly annotated data and allows researchers to efficiently develop and compare their own methods.
Stars: ✭ 76 (+65.22%)
Mutual labels:  weak-supervision, weakly-supervised-learning
WeFEND-AAAI20
Dataset for paper "Weak Supervision for Fake News Detection via Reinforcement Learning" published in AAAI'2020.
Stars: ✭ 67 (+45.65%)
Mutual labels:  weak-supervision, weakly-supervised-learning
wrench
WRENCH: Weak supeRvision bENCHmark
Stars: ✭ 185 (+302.17%)
Mutual labels:  weak-supervision, weakly-supervised-learning
Revisiting-Contrastive-SSL
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
Stars: ✭ 81 (+76.09%)
Mutual labels:  representation-learning
Advances-in-Label-Noise-Learning
A curated (most recent) list of resources for Learning with Noisy Labels
Stars: ✭ 360 (+682.61%)
Mutual labels:  weakly-supervised-learning
CAPRICEP
An extended TSP (Time Stretched Pulse). CAPRICEP substantially replaces FVN. CAPRICEP enables interactive and real-time measurement of the linear time-invariant, the non-linear time-invariant, and random and time varying responses simultaneously.
Stars: ✭ 23 (-50%)
Mutual labels:  data-augmentation
ccgl
TKDE 22. CCCL: Contrastive Cascade Graph Learning.
Stars: ✭ 20 (-56.52%)
Mutual labels:  data-augmentation
GaNDLF
A generalizable application framework for segmentation, regression, and classification using PyTorch
Stars: ✭ 77 (+67.39%)
Mutual labels:  data-augmentation
anatome
Ἀνατομή is a PyTorch library to analyze representation of neural networks
Stars: ✭ 50 (+8.7%)
Mutual labels:  representation-learning
weakly-action-localization
No description or website provided.
Stars: ✭ 30 (-34.78%)
Mutual labels:  weakly-supervised
SEC-tensorflow
a tensorflow version for SEC approach in the paper "seed, expand and constrain: three principles for weakly-supervised image segmentation".
Stars: ✭ 35 (-23.91%)
Mutual labels:  weakly-supervised
Representation-Learning-for-Information-Extraction
Pytorch implementation of Paper by Google Research - Representation Learning for Information Extraction from Form-like Documents.
Stars: ✭ 82 (+78.26%)
Mutual labels:  representation-learning
PointCutMix
our code for paper 'PointCutMix: Regularization Strategy for Point Cloud Classification'
Stars: ✭ 42 (-8.7%)
Mutual labels:  data-augmentation

LEARNING FROM RULES GENERALIZING LABELED EXEMPLARS (ICLR 2020)

This repository provides an implementation of experiments in our ICLR2020 paper

@inproceedings{
Awasthi2020Learning,
title={Learning from Rules Generalizing Labeled Exemplars},
author={Abhijeet Awasthi and Sabyasachi Ghosh and Rasna Goyal and Sunita Sarawagi},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=SkeuexBtDr}
}

Requirements

This code has been developed with

  • python 3.6
  • tensorflow 1.12.0
  • numpy 1.17.2
  • snorkel 0.9.1
  • tensorflow_hub 0.7.0

Data Description

We have currently released processed version of 4 datastes used in our paper. Following datasets can be found in data/ directory

data/TREC (or any other data dir) consists following four pickle files

  • d_processed.p (d set: labeled data -- In paper we refer to this is as the "L" dataset)
  • U_processed.p (U set: unlabeled data -- In paper as well this is referred as "U" dataset)
  • validation_processed.p (validation data)
  • test_processed.p (test data)
  • NOTE U_processed.p for YOUTUBE and MITR is unavailable on GitHub due to larger size. You can download entire data dir from this link

Following objects are dumped inside each pickle file

  • x : feature representation of instances
    • shape : [num_instances, num_features]
  • l : Class Labels assigned by rules
    • shape : [num_instances, num_rules]
    • class labels belong to {0, 1, 2, .. num_classes-1}
    • l[i][j] provides the class label provided by jth rule on ith instance
    • if jth rule doesn't cover ith instance, then l[i][j] = num_classes (convention)
    • in snorkel, convention is to keep l[i][j] = -1, if jth rule doesn't cover ith instance
  • m : Rule coverage mask
    • A binary matrix of shape [num_instances, num_rules]
    • m[i][j] = 1 if jth rule cover ith instance
    • m[i][j] = 0 otherwise
  • L : Instance labels
    • shape : [num_instances, 1]
    • L[i] = label of ith instance, if label is available i.e. if instance is from labeled set d
    • Else, L[i] = num_clases if instances comes from the unlabeled set U
    • class labels belong to {0, 1, 2, .. num_classes-1}
  • d : binary matrix of shape [num_instances, 1]
    • d[i]=1 if instance belongs to labeled data (d), d[i]=0 otherwise
    • d[i]=1 for all instances is from d_processed.p
    • d[i]=0 for all instances in other 3 pickles {U,validation,test}_processed.p
  • r : A binary matrix of shape [num_instances, num_rules]
    • r[i][j]=1 if jth rule was associated with ith instance
    • Highly sparse matrix
    • r is a 0 matrix in all the pickles except d_processed.p
    • Note that this is different from rule coverage mask "m"
    • This matrix defines the coupled rule,example pairs.

Usage

From src/hls

  • For reproducing numbers in Table 1, Row 1
    • python3 get_rule_related_statistics.py ../../data/TREC 6 None
    • This also provides Majority Vote accuracy in Table2 Column2 (Question dataset)
  • For training, saving and testing a snorkel model
    • python3 run_snorkel.py ../../data/TREC 6 None
    • (RUN THIS BEFORE EXPERIMENTS WHICH DEPEND ON SNORKEL LABELS) if snorkel model is not already saved in the dataset directory.
    • We have released pre-trained snorkel models in each dataset directory with name "saved_label_model" )
  • For reproducing (approximately) numbers in Table2 Column2 (Question dataset)
    • use train_TREC.sh for training models for different loss functions
    • use test_TREC.sh for testing models for different loss functions
    • best hyperparameters are already set in these scripts
    • both of the above scripts use TREC.sh
  • For reproducing numbers (approximately) for other datasets follow steps same as above, with TREC replaced by the dataset name.

Note:

  • f network refes to the classification network
  • w network refers to the rule network

File Description in src/hls

  • analyze_w_predictions.py - Used for diagnostics (Old Precision Vs Denoised Precision in Figure 3)
  • checkpoint.py - Load/Save checkpoints (Uses code from checkmate)
  • config.py - All configuration options go here
  • data_feeders.py - all kind of data handling for training and testing.
  • data_feeder_utils.py - Load train/test data from processed pickles
  • data_utils.py - Other utilities related to data processing
  • generalized_cross_entropy_utils.py - Implementation of a noise tolerant loss functions
  • get_rule_related_statistics.py - For reproducing numbers in Table 1
  • hls_data_types.py - some basic data types used in data_feeders.py
  • hls_model.py - Creates train ops All the loss functions are defined here
  • hls_test.py - Runs inference using f or w.
    • Inference on f tests the classification network (valid for all the loss functions)
    • Inference on w is used to analyze the denoised rule-precision obtained by w network
    • Inference on w is only meaningful for ImplyLoss and Posterior Reg. method since only these involve a rule (w) network.
  • hls_train.py - Two modes:
    • f_d (simply trains f network on labeled data)
    • f_d_U : used for all other modes which utilize unlabeled data
  • learn2reweight_utils.py - utilities for implementing L2R method
  • main.py - entry point
  • metrics_utils.py - utilities for computing metrics
  • networks.py - implementation of f network (classification network) and w network (rule network)
  • pr_utils.py - utilities for implementing Posterior Reg. method
  • run_snorkel.py - training, saving and testing a snorkel model
  • snorkel_utils.py - utilitiy to convert l in our format to l in snorkel's format
  • test_"DATASET_NAME".sh - model testing (inference) script
    • e.g. test_TREC.sh runs inference for models trained on TREC dataset
  • "train_"DATASET_NAME".sh - model training script
    • e.g. train_TREC.sh trains models on TREC dataset
  • "DATASET_NAME".sh - test_"DATASET_NAME".sh and train_"DATASET_NAME".sh use "DATASET_NAME".sh
  • utils.py - misc. utilities
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].