Aditi138 / EntityTargetedActiveLearning

Licence: other

No description or website provided.

Programming Languages

python

139335 projects - #7 most used programming language

perl

6916 projects

shell

77523 projects

Projects that are alternatives of or similar to EntityTargetedActiveLearning

Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Stars: ✭ 2,235 (+13047.06%)

Mutual labels: named-entity-recognition, transfer-learning

AlpacaTag

AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging (ACL 2019 Demo)

Stars: ✭ 126 (+641.18%)

Mutual labels: named-entity-recognition, active-learning

Filipino-Text-Benchmarks

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Stars: ✭ 22 (+29.41%)

Mutual labels: transfer-learning, low-resource-languages

Bert Sklearn

a sklearn wrapper for Google's BERT model

Stars: ✭ 182 (+970.59%)

Mutual labels: named-entity-recognition, transfer-learning

MoeFlow

Repository for anime characters recognition website, powered by TensorFlow

Stars: ✭ 113 (+564.71%)

Mutual labels: transfer-learning

farasapy

A Python implementation of Farasa toolkit

Stars: ✭ 69 (+305.88%)

Mutual labels: named-entity-recognition

deep-learning

Projects include the application of transfer learning to build a convolutional neural network (CNN) that identifies the artist of a painting, the building of predictive models for Bitcoin price data using Long Short-Term Memory recurrent neural networks (LSTMs) and a tutorial explaining how to build two types of neural network using as input the…

Stars: ✭ 43 (+152.94%)

Mutual labels: transfer-learning

differential-privacy-bayesian-optimization

This repo contains the underlying code for all the experiments from the paper: "Automatic Discovery of Privacy-Utility Pareto Fronts"

Stars: ✭ 22 (+29.41%)

Mutual labels: active-learning

CPCE-3D

Low-dose CT via Transfer Learning from a 2D Trained Network, In IEEE TMI 2018

Stars: ✭ 40 (+135.29%)

Mutual labels: transfer-learning

LegoBrickClassification

Repository to identify Lego bricks automatically only using images

Stars: ✭ 57 (+235.29%)

Mutual labels: transfer-learning

metamaplite

A near real-time named-entity recognizer

Stars: ✭ 37 (+117.65%)

Mutual labels: named-entity-recognition

Context-Transformer

Context-Transformer: Tackling Object Confusion for Few-Shot Detection, AAAI 2020

Stars: ✭ 89 (+423.53%)

Mutual labels: transfer-learning

Shukongdashi

使用知识图谱，自然语言处理，卷积神经网络等技术，基于python语言，设计了一个数控领域故障诊断专家系统

Stars: ✭ 109 (+541.18%)

Mutual labels: named-entity-recognition

memex-gate

General Architecture for Text Engineering

Stars: ✭ 47 (+176.47%)

Mutual labels: named-entity-recognition

task-transferability

Data and code for our paper "Exploring and Predicting Transferability across NLP Tasks", to appear at EMNLP 2020.

Stars: ✭ 35 (+105.88%)

Mutual labels: transfer-learning

AB distillation

Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons (AAAI 2019)

Stars: ✭ 105 (+517.65%)

Mutual labels: transfer-learning

TransTQA

Author: Wenhao Yu ([email protected]). EMNLP'20. Transfer Learning for Technical Question Answering.

Stars: ✭ 12 (-29.41%)

Mutual labels: transfer-learning

paper annotations

A place to keep track of all the annotated papers.

Stars: ✭ 96 (+464.71%)

Mutual labels: transfer-learning

Keras-MultiClass-Image-Classification

Multiclass image classification using Convolutional Neural Network

Stars: ✭ 48 (+182.35%)

Mutual labels: transfer-learning

favorite-research-papers

Listing my favorite research papers 📝 from different fields as I read them.

Stars: ✭ 12 (-29.41%)

Mutual labels: transfer-learning

View All Similar Projects ➔

Active Learning for Entity Recognition

Requirements

python 2.7
DynetVersion commit 284838815ece9297a7100cc43035e1ea1b133a5

Data

In the data/, create a directory per language as shown for data/Spanish. Download the CoNLL train/dev/test NER datasets for that language here. To acquire LDC datasets, please get the required access.

For storing the trained models, create directory saved_models in the parent folder.

Embeddings

Combine monolingual data acquired from Wikipedia with the plain text extracted from the labeled data. Train 100-d Glove embeddings

Active Learning Simulation

The best NER performance was obtained using fine-tuning training scheme. The scripts below runs simulation active learning runs for different active learning strategies: cd commands

ETAL + Partial-CRF + CT (Proposed recipe)
./ETAL_PARTIAL_CRF_CT.sh
ETAL + Full-CRF + CT
./ETAL_FULL_CRF_CT.sh
CFEAL + Full-CRF + CT
./CFEAL_PARTIAL_CRF_CT.sh
SAL + CT
./SAL_CT.sh
Things to note:

We load the vocabulary from the following path--aug_lang_train_path. Therefore, create a conll formatted file with dummy labels from the unlabeled text. For our experiments, we concatenated the transferred data with the unlabeled data (which was the entire training dataset) into a single conll formatted file. The conll format is a tab separated two-column format as shown below:

El O
grupo O

The LDC NER label set differ from the CoNLL label set by one tag. Therefore, add --misc to the argument set when running any experiments on CoNLL data. The label set has been hard-coded in the data_loaders/data_loader.py file.

Cross-Lingual Transferred Data

We used the model proposed by (Xie et al. 2018) to get the cross-lingually transferred data from English. Please refer to their code here.

For the Fine-Tune training scheme, train a base NER model on the transferred model as follows:

MODEL_NAME="spanish_full_transfer_baseline"
python -u ../main.py \
    --dynet-seed 3278657 \
    --word_emb_dim 100 \
    --batch_size 10 \
    --model_name ${MODEL_NAME} \
    --lang es \
    --fixedVocab \
    --test_conll \
    --tot_epochs 1000 \
--aug_lang_train_path $DATA/vocab.conll \
    --init_lr 0.015 \
    --valid_freq 1300 \
    --misc \
    --pretrain_emb_path $DATA/esp.vec \
    --dev_path $DATA/esp.dev \
    --test_path $DATA/esp.test \
    --train_path $DIR/transferred_data.conll  2>&1 | tee ${MODEL_NAME}.log

References

If you make use of this software for research purposes, we will appreciate citing the following:

@inproceedings{chaudhary19emnlp,
    title = {A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers},
    author = {Aditi Chaudhary and Jiateng Xie and Zaid Sheikh and Graham Neubig and Jaime Carbonell},
    booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    address = {Hong Kong},
    month = {November},
    url = {http://arxiv.org/abs/1908.08983},
    year = {2019}
}

Contact

For any issues, please feel free to reach out to [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Aditi138 / EntityTargetedActiveLearning

Programming Languages

Labels

Projects that are alternatives of or similar to EntityTargetedActiveLearning

Active Learning for Entity Recognition

Requirements

Data

Embeddings

Active Learning Simulation

Cross-Lingual Transferred Data

References

Contact