All Projects → Alibaba-NLP → CLNER

Alibaba-NLP / CLNER

Licence: other
[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

Programming Languages

python
139335 projects - #7 most used programming language
perl
6916 projects

Projects that are alternatives of or similar to CLNER

nervaluate
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13
Stars: ✭ 40 (-20%)
Mutual labels:  named-entity-recognition
bern
A neural named entity recognition and multi-type normalization tool for biomedical text mining
Stars: ✭ 151 (+202%)
Mutual labels:  named-entity-recognition
IE Paper Notes
Paper notes for Information Extraction, including Relation Extraction (RE), Named Entity Recognition (NER), Entity Linking (EL), Event Extraction (EE), Named Entity Disambiguation (NED).
Stars: ✭ 14 (-72%)
Mutual labels:  named-entity-recognition
BERT-NER
Using pre-trained BERT models for Chinese and English NER with 🤗Transformers
Stars: ✭ 114 (+128%)
Mutual labels:  named-entity-recognition
presidio-research
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
Stars: ✭ 62 (+24%)
Mutual labels:  named-entity-recognition
watchman
Watchman: An open-source social-media event-detection system
Stars: ✭ 18 (-64%)
Mutual labels:  named-entity-recognition
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+202%)
Mutual labels:  named-entity-recognition
clinical concept extraction
Clinical Concept Extraction with Contextual Word Embedding
Stars: ✭ 34 (-32%)
Mutual labels:  named-entity-recognition
mitie-ruby
Named-entity recognition for Ruby
Stars: ✭ 77 (+54%)
Mutual labels:  named-entity-recognition
TorchBlocks
A PyTorch-based toolkit for natural language processing
Stars: ✭ 85 (+70%)
Mutual labels:  named-entity-recognition
ner-d
Python module for Named Entity Recognition (NER) using natural language processing.
Stars: ✭ 14 (-72%)
Mutual labels:  named-entity-recognition
eve-bot
EVE bot, a customer service chatbot to enhance virtual engagement for Twitter Apple Support
Stars: ✭ 31 (-38%)
Mutual labels:  named-entity-recognition
react-taggy
A simple zero-dependency React component for tagging user-defined entities within a block of text.
Stars: ✭ 29 (-42%)
Mutual labels:  named-entity-recognition
AlpacaTag
AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging (ACL 2019 Demo)
Stars: ✭ 126 (+152%)
Mutual labels:  named-entity-recognition
nlp ner workshop
Named-Entity-Recognition Workshop
Stars: ✭ 15 (-70%)
Mutual labels:  named-entity-recognition
nested-ner-tacl2020-flair
Implementation of Nested Named Entity Recognition using Flair
Stars: ✭ 23 (-54%)
Mutual labels:  named-entity-recognition
named-entity-recognition-template
Build a deep learning model for predicting the named entities from text.
Stars: ✭ 51 (+2%)
Mutual labels:  named-entity-recognition
LNEx
📍 🏢 🏦 🏣 🏪 🏬 LNEx: Location Name Extractor
Stars: ✭ 21 (-58%)
Mutual labels:  named-entity-recognition
huner
Named Entity Recognition for biomedical entities
Stars: ✭ 44 (-12%)
Mutual labels:  named-entity-recognition
NER corpus chinese
NER(命名实体识别)中文语料,一站式获取
Stars: ✭ 102 (+104%)
Mutual labels:  named-entity-recognition

CLNER

The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER is a framework for improving the accuracy of NER models through retrieving external contexts, then use the cooperative learning approach to improve the both input views. The code is initially based on flair version 0.4.3. Then the code is extended with knwoledge distillation and ACE approaches to distill smaller models or achieve SOTA results. The config files in these repos are also applicable to this code.

PWC PWC PWC PWC PWC PWC

Guide

Requirements

The project is based on PyTorch 1.1+ and Python 3.6+. To run our code, install:

pip install -r requirements.txt

The following requirements should be satisfied:

Datasets

The datasets used in our paper are available here.

Training

Training NER Models with External Contexts

Run:

CUDA_VISIBLE_DEVICES=0 python train.py --config config/wnut17_doc.yaml

Training NER Models with Cooperative Learning

Run:

CUDA_VISIBLE_DEVICES=0 python train.py --config config/wnut17_doc_cl_kl.yaml
CUDA_VISIBLE_DEVICES=0 python train.py --config config/wnut17_doc_cl_l2.yaml

Train on Your Own Dataset

To set the dataset manully, you can set the dataset in the $config_file by:

targets: ner
ner:
  Corpus: ColumnCorpus-1
  ColumnCorpus-1: 
    data_folder: datasets/conll_03_english
    column_format:
      0: text
      1: pos
      2: chunk
      3: ner
    tag_to_bioes: ner
  tag_dictionary: resources/taggers/your_ner_tags.pkl

The tag_dictionary is a path to the tag dictionary for the task. If the path does not exist, the code will generate a tag dictionary at the path automatically. The dataset format is: Corpus: $CorpusClassName-$id, where $id is the name of datasets (anything you like). You can train multiple datasets jointly. For example:

Please refer to Config File for more details.

Parse files

If you want to parse a certain file, add train in the file name and put the file in a certain $dir (for example, parse_file_dir/train.your_file_name). Run:

CUDA_VISIBLE_DEVICES=0 python train.py --config $config_file --parse --target_dir $dir --keep_order

The format of the file should be column_format={0: 'text', 1:'ner'} for sequence labeling or you can modifiy line 232 in train.py. The parsed results will be in outputs/. Note that you may need to preprocess your file with the dummy tags for prediction, please check this issue for more details.

Config File

The config files are based on yaml format.

  • targets: The target task
    • ner: named entity recognition
    • upos: part-of-speech tagging
    • chunk: chunking
    • ast: abstract extraction
    • dependency: dependency parsing
    • enhancedud: semantic dependency parsing/enhanced universal dependency parsing
  • ner: An example for the targets. If targets: ner, then the code will read the values with the key of ner.
    • Corpus: The training corpora for the model, use : to split different corpora.
    • tag_dictionary: A path to the tag dictionary for the task. If the path does not exist, the code will generate a tag dictionary at the path automatically.
  • target_dir: Save directory.
  • model_name: The trained models will be save in $target_dir/$model_name.
  • model: The model to train, depending on the task.
    • FastSequenceTagger: Sequence labeling model. The values are the parameters.
    • SemanticDependencyParser: Syntactic/semantic dependency parsing model. The values are the parameters.
  • embeddings: The embeddings for the model, each key is the class name of the embedding and the values of the key are the parameters, see flair/embeddings.py for more details. For each embedding, use $classname-$id to represent the class. For example, if you want to use BERT and M-BERT for a single model, you can name: TransformerWordEmbeddings-0, TransformerWordEmbeddings-1.
  • trainer: The trainer class.
    • ModelFinetuner: The trainer for fine-tuning embeddings or simply train a task model without ACE.
    • ReinforcementTrainer: The trainer for training ACE.
  • train: the parameters for the train function in trainer (for example, ReinforcementTrainer.train()).

Citing Us

If you feel the code helpful, please cite:

@inproceedings{wang2021improving,
    title = "{{Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning}}",
    author={Wang, Xinyu and Jiang, Yong and Bach, Nguyen and Wang, Tao and Huang, Zhongqiang and Huang, Fei and Tu, Kewei},
    booktitle = "{the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (\textbf{ACL-IJCNLP 2021})}",
    month = aug,
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

Contact

Feel free to email your questions or comments to issues or to Xinyu Wang.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].