All Projects → hu-ner → huner

hu-ner / huner

Licence: other
Named Entity Recognition for biomedical entities

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects
perl
6916 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to huner

CrossNER
CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)
Stars: ✭ 87 (+97.73%)
Mutual labels:  named-entity-recognition, corpora, ner
anonymization-api
How to build and deploy an anonymization API with FastAPI
Stars: ✭ 51 (+15.91%)
Mutual labels:  named-entity-recognition, ner
molminer
Python library and command-line tool for extracting compounds from scientific literature. Written in Python.
Stars: ✭ 38 (-13.64%)
Mutual labels:  named-entity-recognition, ner
NER corpus chinese
NER(命名实体识别)中文语料,一站式获取
Stars: ✭ 102 (+131.82%)
Mutual labels:  named-entity-recognition, ner
SynLSTM-for-NER
Code and models for the paper titled "Better Feature Integration for Named Entity Recognition", NAACL 2021.
Stars: ✭ 26 (-40.91%)
Mutual labels:  named-entity-recognition, ner
scikitcrf NER
Python library for custom entity recognition using Sklearn CRF
Stars: ✭ 17 (-61.36%)
Mutual labels:  named-entity-recognition, ner
nalaf
NLP framework in python for entity recognition and relationship extraction
Stars: ✭ 104 (+136.36%)
Mutual labels:  ner, bionlp
react-taggy
A simple zero-dependency React component for tagging user-defined entities within a block of text.
Stars: ✭ 29 (-34.09%)
Mutual labels:  named-entity-recognition, ner
mitie-ruby
Named-entity recognition for Ruby
Stars: ✭ 77 (+75%)
Mutual labels:  named-entity-recognition, ner
lingvo--Ner-ru
Named entity recognition (NER) in Russian texts / Определение именованных сущностей (NER) в тексте на русском языке
Stars: ✭ 38 (-13.64%)
Mutual labels:  named-entity-recognition, ner
TweebankNLP
[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
Stars: ✭ 84 (+90.91%)
Mutual labels:  named-entity-recognition, ner
presidio-research
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
Stars: ✭ 62 (+40.91%)
Mutual labels:  named-entity-recognition, ner
NER-and-Linking-of-Ancient-and-Historic-Places
An NER tool for ancient place names based on Pleiades and Spacy.
Stars: ✭ 26 (-40.91%)
Mutual labels:  named-entity-recognition, ner
ner-d
Python module for Named Entity Recognition (NER) using natural language processing.
Stars: ✭ 14 (-68.18%)
Mutual labels:  named-entity-recognition, ner
neural name tagging
Code for "Reliability-aware Dynamic Feature Composition for Name Tagging" (ACL2019)
Stars: ✭ 39 (-11.36%)
Mutual labels:  named-entity-recognition, ner
PhoNER COVID19
COVID-19 Named Entity Recognition for Vietnamese (NAACL 2021)
Stars: ✭ 55 (+25%)
Mutual labels:  named-entity-recognition, ner
Ner Bert Pytorch
PyTorch solution of named entity recognition task Using Google AI's pre-trained BERT model.
Stars: ✭ 249 (+465.91%)
Mutual labels:  named-entity-recognition, ner
KoBERT-NER
NER Task with KoBERT (with Naver NLP Challenge dataset)
Stars: ✭ 76 (+72.73%)
Mutual labels:  named-entity-recognition, ner
korean ner tagging challenge
KU_NERDY 이동엽, 임희석 (2017 국어 정보 처리 시스템경진대회 금상) - 한글 및 한국어 정보처리 학술대회
Stars: ✭ 30 (-31.82%)
Mutual labels:  named-entity-recognition, ner
deep-atrous-ner
Deep-Atrous-CNN-NER: Word level model for Named Entity Recognition
Stars: ✭ 35 (-20.45%)
Mutual labels:  named-entity-recognition, ner

HUNER

We recently published HunFlair, a reimplementation of HUNER inside the Flair framework. By using language models, HunFlair considerably outperforms HUNER. In addition, as part of Flair, HunFlair is easy to install and does not have a dependency on Docker. We recommend all HUNER users to migrate to HunFlair.

HUNER is a state-of-the-art NER model for biomedical entities. It comes with models for genes/proteins, chemicals, diseases, species and cell lines.

The code is based on the great LSTM-CRF NER tagger implementation glample/tagger by Guillaume Lample.

Content

Section Description
Installation How to install HUNER
Usage How to use HUNER
Models Available pretrained models
Corpora The HUNER Corpora

Installation

  1. Install docker
  2. Clone this repository to $dir
  3. Download the pretrained model you want to use from here, place it into $dir/models/$model_name and untar it using tar xzf $model_name

Usage

Tagging

To tokenize, sentence split and tag a file INPUT.TXT:

  1. Start the HUNER server from $dir using ./start_server $model_name. The model must reside in the directory $dir/models/$model_name.
  2. Tag text with python client.py INPUT.TXT OUTPUT.CONLL --name $model_name.

the output will then be written to OUTPUT.CONLL in the conll2003 format.

The options for client.py are:

  • --asume_tokenized: The input is already pre-tokenized and the tokens are separated by whitespace
  • --assume_sentence_splitted: The input is already split into sentences and each line of the input contains one sentence

Fine-tuning on a new corpus

The steps to fine-tune a base-model $base_model (e.g. gene_all) on a new corpus $corpus are:

  1. Copy the chosen base-model to a new directory, because the weight files will be updated during fine-tuning:
cp $dir/models/$base_model $dir/models/$fine_tuned_model
  1. Convert your corpus to conll format and split it into train, dev and test portions. If you don't want to use either dev or test data you can just provide the training data as dev or test. Note however, that without dev data, results will probably suffer, because early-stopping can't be performed.
  2. Fine-tune the model:
./train.sh $fine_tuned_model $corpus_train $corpus_dev $corpus_test

After successful training, $fine_tuned_model will contain the fine-tuned model and can be used exactly like the models provided by us.

Retraining a base-model from scratch (without fine-tuning)

To train a model from scratch without initializing it from a base-model, proceed as follows:

  1. Convert your corpus to conll format and split it into train, dev and test portions. If you don't want to use either dev or test data you can just provide the training data as dev or test. Note however, that without dev data, results will probably suffer, because early-stopping can't be performed.
  2. Train the model:
./train_no_finetune.sh $corpus_train $corpus_dev $corpus_test

After sucessful training, the model can be found in a newly created directory in models/. The directory name reflects the chosen hyper-parameters and usually reads like tag_scheme=iob,lower=False,zeros=False,char_dim=25....

Models

Model Test sets P / R / F1 (%) CRAFT P / R / F1 (%)
cellline_all 65.09 / 67.69 / 66.08 -
chemical_all 83.34 / 80.26 / 81.71 53.56 / 35.85 / 42.95
disease_all 75.01 / 77.71 / 76.20 -
gene_all 75.01 / 79.16 / 76.81 59.67 / 65.98 / 62.66
species_all 85.37 / 79.98 / 82.59 98.51 / 73.83 / 84.40

Corpora

For details and instructions on the HUNER corpora please refer to https://github.com/hu-ner/huner/tree/master/ner_scripts and the corresponding readme.

Citation

Please use the following bibtex entry:

@article{weber2019huner,
  title={HUNER: Improving Biomedical NER with Pretraining},
  author={Weber, Leon and M{\"u}nchmeyer, Jannes and Rockt{\"a}schel, Tim and Habibi, Maryam and Leser, Ulf},
  journal={Bioinformatics},
  year={2019}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].