All Projects → Aditi138 → EntityTargetedActiveLearning

Aditi138 / EntityTargetedActiveLearning

Licence: other
No description or website provided.

Programming Languages

python
139335 projects - #7 most used programming language
perl
6916 projects
shell
77523 projects

Projects that are alternatives of or similar to EntityTargetedActiveLearning

Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Stars: ✭ 2,235 (+13047.06%)
Mutual labels:  named-entity-recognition, transfer-learning
AlpacaTag
AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging (ACL 2019 Demo)
Stars: ✭ 126 (+641.18%)
Mutual labels:  named-entity-recognition, active-learning
Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (+29.41%)
Mutual labels:  transfer-learning, low-resource-languages
Bert Sklearn
a sklearn wrapper for Google's BERT model
Stars: ✭ 182 (+970.59%)
Mutual labels:  named-entity-recognition, transfer-learning
MoeFlow
Repository for anime characters recognition website, powered by TensorFlow
Stars: ✭ 113 (+564.71%)
Mutual labels:  transfer-learning
farasapy
A Python implementation of Farasa toolkit
Stars: ✭ 69 (+305.88%)
Mutual labels:  named-entity-recognition
deep-learning
Projects include the application of transfer learning to build a convolutional neural network (CNN) that identifies the artist of a painting, the building of predictive models for Bitcoin price data using Long Short-Term Memory recurrent neural networks (LSTMs) and a tutorial explaining how to build two types of neural network using as input the…
Stars: ✭ 43 (+152.94%)
Mutual labels:  transfer-learning
differential-privacy-bayesian-optimization
This repo contains the underlying code for all the experiments from the paper: "Automatic Discovery of Privacy-Utility Pareto Fronts"
Stars: ✭ 22 (+29.41%)
Mutual labels:  active-learning
CPCE-3D
Low-dose CT via Transfer Learning from a 2D Trained Network, In IEEE TMI 2018
Stars: ✭ 40 (+135.29%)
Mutual labels:  transfer-learning
LegoBrickClassification
Repository to identify Lego bricks automatically only using images
Stars: ✭ 57 (+235.29%)
Mutual labels:  transfer-learning
metamaplite
A near real-time named-entity recognizer
Stars: ✭ 37 (+117.65%)
Mutual labels:  named-entity-recognition
Context-Transformer
Context-Transformer: Tackling Object Confusion for Few-Shot Detection, AAAI 2020
Stars: ✭ 89 (+423.53%)
Mutual labels:  transfer-learning
Shukongdashi
使用知识图谱,自然语言处理,卷积神经网络等技术,基于python语言,设计了一个数控领域故障诊断专家系统
Stars: ✭ 109 (+541.18%)
Mutual labels:  named-entity-recognition
memex-gate
General Architecture for Text Engineering
Stars: ✭ 47 (+176.47%)
Mutual labels:  named-entity-recognition
task-transferability
Data and code for our paper "Exploring and Predicting Transferability across NLP Tasks", to appear at EMNLP 2020.
Stars: ✭ 35 (+105.88%)
Mutual labels:  transfer-learning
AB distillation
Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons (AAAI 2019)
Stars: ✭ 105 (+517.65%)
Mutual labels:  transfer-learning
TransTQA
Author: Wenhao Yu ([email protected]). EMNLP'20. Transfer Learning for Technical Question Answering.
Stars: ✭ 12 (-29.41%)
Mutual labels:  transfer-learning
paper annotations
A place to keep track of all the annotated papers.
Stars: ✭ 96 (+464.71%)
Mutual labels:  transfer-learning
Keras-MultiClass-Image-Classification
Multiclass image classification using Convolutional Neural Network
Stars: ✭ 48 (+182.35%)
Mutual labels:  transfer-learning
favorite-research-papers
Listing my favorite research papers 📝 from different fields as I read them.
Stars: ✭ 12 (-29.41%)
Mutual labels:  transfer-learning

Active Learning for Entity Recognition

Requirements

python 2.7
DynetVersion commit 284838815ece9297a7100cc43035e1ea1b133a5

Data

In the data/, create a directory per language as shown for data/Spanish. Download the CoNLL train/dev/test NER datasets for that language here. To acquire LDC datasets, please get the required access.

For storing the trained models, create directory saved_models in the parent folder.

Embeddings

Combine monolingual data acquired from Wikipedia with the plain text extracted from the labeled data. Train 100-d Glove embeddings

Active Learning Simulation

The best NER performance was obtained using fine-tuning training scheme. The scripts below runs simulation active learning runs for different active learning strategies: cd commands

  • ETAL + Partial-CRF + CT (Proposed recipe)
    ./ETAL_PARTIAL_CRF_CT.sh
  • ETAL + Full-CRF + CT
    ./ETAL_FULL_CRF_CT.sh
  • CFEAL + Full-CRF + CT
    ./CFEAL_PARTIAL_CRF_CT.sh
  • SAL + CT
    ./SAL_CT.sh
    Things to note:

We load the vocabulary from the following path--aug_lang_train_path. Therefore, create a conll formatted file with dummy labels from the unlabeled text. For our experiments, we concatenated the transferred data with the unlabeled data (which was the entire training dataset) into a single conll formatted file. The conll format is a tab separated two-column format as shown below:

El O
grupo O

The LDC NER label set differ from the CoNLL label set by one tag. Therefore, add --misc to the argument set when running any experiments on CoNLL data. The label set has been hard-coded in the data_loaders/data_loader.py file.

Cross-Lingual Transferred Data

We used the model proposed by (Xie et al. 2018) to get the cross-lingually transferred data from English. Please refer to their code here.

For the Fine-Tune training scheme, train a base NER model on the transferred model as follows:

MODEL_NAME="spanish_full_transfer_baseline"
python -u ../main.py \
    --dynet-seed 3278657 \
    --word_emb_dim 100 \
    --batch_size 10 \
    --model_name ${MODEL_NAME} \
    --lang es \
    --fixedVocab \
    --test_conll \
    --tot_epochs 1000 \
--aug_lang_train_path $DATA/vocab.conll \
    --init_lr 0.015 \
    --valid_freq 1300 \
    --misc \
    --pretrain_emb_path $DATA/esp.vec \
    --dev_path $DATA/esp.dev \
    --test_path $DATA/esp.test \
    --train_path $DIR/transferred_data.conll  2>&1 | tee ${MODEL_NAME}.log 

References

If you make use of this software for research purposes, we will appreciate citing the following:

@inproceedings{chaudhary19emnlp,
    title = {A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers},
    author = {Aditi Chaudhary and Jiateng Xie and Zaid Sheikh and Graham Neubig and Jaime Carbonell},
    booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    address = {Hong Kong},
    month = {November},
    url = {http://arxiv.org/abs/1908.08983},
    year = {2019}
}

Contact

For any issues, please feel free to reach out to [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].