All Projects → dmis-lab → bern

dmis-lab / bern

Licence: BSD-2-Clause license
A neural named entity recognition and multi-type normalization tool for biomedical text mining

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to bern

knowledge-graph-nlp-in-action
从模型训练到部署,实战知识图谱(Knowledge Graph)&自然语言处理(NLP)。涉及 Tensorflow, Bert+Bi-LSTM+CRF,Neo4j等 涵盖 Named Entity Recognition,Text Classify,Information Extraction,Relation Extraction 等任务。
Stars: ✭ 58 (-61.59%)
Mutual labels:  named-entity-recognition, bert
Mt Dnn
Multi-Task Deep Neural Networks for Natural Language Understanding
Stars: ✭ 1,871 (+1139.07%)
Mutual labels:  named-entity-recognition, bert
TorchBlocks
A PyTorch-based toolkit for natural language processing
Stars: ✭ 85 (-43.71%)
Mutual labels:  named-entity-recognition, bert
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+1567.55%)
Mutual labels:  named-entity-recognition, bert
BERTOverflow
A Pre-trained BERT on StackOverflow Corpus
Stars: ✭ 40 (-73.51%)
Mutual labels:  named-entity-recognition, bert
DeepNER
An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.
Stars: ✭ 9 (-94.04%)
Mutual labels:  named-entity-recognition, bert
Bert Bilstm Crf Ner
Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services
Stars: ✭ 3,838 (+2441.72%)
Mutual labels:  named-entity-recognition, bert
Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Stars: ✭ 2,235 (+1380.13%)
Mutual labels:  named-entity-recognition, bert
OpenUE
OpenUE是一个轻量级知识图谱抽取工具 (An Open Toolkit for Universal Extraction from Text published at EMNLP2020: https://aclanthology.org/2020.emnlp-demos.1.pdf)
Stars: ✭ 274 (+81.46%)
Mutual labels:  named-entity-recognition, bert
banglabert
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chap…
Stars: ✭ 186 (+23.18%)
Mutual labels:  named-entity-recognition, bert
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+0%)
Mutual labels:  named-entity-recognition, bert
eve-bot
EVE bot, a customer service chatbot to enhance virtual engagement for Twitter Apple Support
Stars: ✭ 31 (-79.47%)
Mutual labels:  named-entity-recognition
GoEmotions-pytorch
Pytorch Implementation of GoEmotions 😍😢😱
Stars: ✭ 95 (-37.09%)
Mutual labels:  bert
BERT-NER
Using pre-trained BERT models for Chinese and English NER with 🤗Transformers
Stars: ✭ 114 (-24.5%)
Mutual labels:  named-entity-recognition
AlpacaTag
AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging (ACL 2019 Demo)
Stars: ✭ 126 (-16.56%)
Mutual labels:  named-entity-recognition
presidio-research
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
Stars: ✭ 62 (-58.94%)
Mutual labels:  named-entity-recognition
transformer-models
Deep Learning Transformer models in MATLAB
Stars: ✭ 90 (-40.4%)
Mutual labels:  bert
nervaluate
Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13
Stars: ✭ 40 (-73.51%)
Mutual labels:  named-entity-recognition
ganbert-pytorch
Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace
Stars: ✭ 60 (-60.26%)
Mutual labels:  bert
nested-ner-tacl2020-flair
Implementation of Nested Named Entity Recognition using Flair
Stars: ✭ 23 (-84.77%)
Mutual labels:  named-entity-recognition

BERN

BERN is a BioBERT-based multi-type NER tool that also supports normalization of extracted entities. This repository contains the official implementation of BERN. You can use BERN at https://bern.korea.ac.kr, or host your own server by following the description below. Please refer to our paper (Kim et al., IEEE Access 2019) for more details. This project is done by DMIS Laboratory at Korea University.

[Updates]

***** Check out BERN2, an improved version of BERN with much faster and more accurate inference! *****

Fixed our gene normalizer to respond to issues between 2020-03-12 and 2020-03-13

  1. Download gnormplus-normalization_19.jar at this URL and place (overwrite) the file under normalization/resources/normalizers/gene directory.
  2. Stop normalizers by running stop_normalizers.sh
  3. Start the normalizers by running load_dicts.sh

Done - Server down due to air conditioning problems in our server room 2019-10-10 - 2019-10-11 7:55 AM (UTC-0)

Fixed our disease normalizer 2019-08-19, 2019-08-10 and 2019-08-02 issues

  1. Download disease_normalizer_19.jar at this URL and place the file under normalization/resources/normalizers/disease directory.
  2. Stop normalizers by running stop_normalizers.sh and restart the normalizers by running load_dicts.sh

Done - Server check 2019-07-18 8:20 AM - 1:30 PM (UTC-0)

BERN

Overview of BERN.

The description below gives instructions on hosting your own BERN. Please refer to https://bern.korea.ac.kr for the RESTful Web service of BERN.

Requirements

Note that you will need at least 66 GB of free disk space and 32 GB or more RAM.

Installation

  • Clone this repo
cd
git clone https://github.com/dmis-lab/bern.git
  • Install python packages
pip3 install -r requirements.txt --user
cd ~/bern
wget https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/download/GNormPlus/GNormPlusJava.zip
unzip GNormPlusJava.zip

cd GNormPlusJava
wget -O ./crfpp-0.58.tar.gz https://drive.google.com/uc?id=0B4y35FiV1wh7QVR6VXJ5dWExSTQ
tar xvfz crfpp-0.58.tar.gz
cp -rf CRF++-0.58/* CRF
cd CRF
sh ./configure
make
sudo make install

cd ..
chmod 764 Ab3P
# chmod 764 CRF/crf_test

# Set FocusSpecies to 9606 (Human)
sed -i 's/= All/= 9606/g' setup.txt; echo "FocusSpecies: from All to 9606 (Human)"
sh Installation.sh

rm -r CRF++-0.58
rm crfpp-0.58.tar.gz

# Download GNormPlusServer.jar
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1g-JlhqeDIlZX5YFk8Y27_M8BXUXcQRSX' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1g-JlhqeDIlZX5YFk8Y27_M8BXUXcQRSX" -O GNormPlusServer.jar && rm -rf /tmp/cookies.txt

# Start GNormPlusServer
nohup java -Xmx16G -Xms16G -jar GNormPlusServer.jar 18895 >> ~/bern/logs/nohup_gnormplus.out 2>&1 &
  • Install tmVar2 & run tmVar2Server.jar
cd ~/bern
wget ftp://ftp.ncbi.nlm.nih.gov/pub/lu/Suppl/tmVar2/tmVarJava.zip
unzip tmVarJava.zip

cd tmVarJava
wget -O ./crfpp-0.58.tar.gz https://drive.google.com/uc?id=0B4y35FiV1wh7QVR6VXJ5dWExSTQ
tar xvfz crfpp-0.58.tar.gz
cp -rf CRF++-0.58/* CRF
cd CRF
sh ./configure
make
sudo make install

cd ..
chmod 764 CRF/crf_test

sh Installation.sh

rm -r CRF++-0.58
rm crfpp-0.58.tar.gz

# Download tmVar2Server.jar
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1kQYzLHLFLsU9qKpRRGjXkIYmaYK6bPJm' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1kQYzLHLFLsU9qKpRRGjXkIYmaYK6bPJm" -O tmVar2Server.jar && rm -rf /tmp/cookies.txt

# Download dependencies
wget https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.20.0/sqlite-jdbc-3.20.0.jar
wget https://repo1.maven.org/maven2/edu/stanford/nlp/stanford-corenlp/3.5.2/stanford-corenlp-3.5.2.jar

# Start tmVar2Server
nohup java -Xmx8G -Xms8G -jar tmVar2Server.jar 18896 >> ~/bern/logs/nohup_tmvar.out 2>&1 &
  • Download normalization resources and pre-trained BioBERT NER models
cd ~/bern/scripts
sh download_norm.sh
sh download_biobert_ner_models.sh
  • Run named entity normalizers
cd ..
sh load_dicts.sh
  • Run BERN server
# Check your GPU number(s)
echo $CUDA_VISIBLE_DEVICES

# Set your GPU number(s)
export CUDA_VISIBLE_DEVICES=0

# Run BERN
# Please check gnormplus_home directory and tmvar2_home directory.
nohup python3 -u server.py --port 8888 --gnormplus_home ~/bern/GNormPlusJava --gnormplus_port 18895 --tmvar2_home ~/bern/tmVarJava --tmvar2_port 18896 >> logs/nohup_BERN.out 2>&1 &

# Print logs
tail -F logs/nohup_BERN.out
  • Usage
    • PMID(s) (HTTP GET)
      • http://<YOUR_SERVER_ADDRESS>:8888/?pmid=<a PMID or comma seperate PMIDs>&format=<json or pubtator>
      • Example: http://<YOUR_SERVER_ADDRESS>:8888/?pmid=30429607&format=json&indent=true
      • Example: http://<YOUR_SERVER_ADDRESS>:8888/?pmid=30429607&format=pubtator
      • Example: http://<YOUR_SERVER_ADDRESS>:8888/?pmid=30429607,29446767&format=json&indent=true
    • Raw text (HTTP POST)
      • POST Address: http://<YOUR_SERVER_ADDRESS>:8888
      • Set key, value of a body as follows:
      import requests
      import json
      body_data = {"param": json.dumps({"text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome."})}
      response = requests.post('http://<YOUR_SERVER_ADDRESS>:8888', data=body_data)
      result_dict = response.json()
      print(result_dict)
      

Result

See a result example in JSON (PMID:29446767)
[
    {
        "denotations": [
            {
                "id": [
                    "MESH:C567763",
                    "BERN:262813101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 0,
                    "end": 13
                }
            },
            {
                "id": [
                    "MIM:171834",
                    "HGNC:8975",
                    "Ensembl:ENSG00000121879",
                    "BERN:324295302"
                ],
                "obj": "gene",
                "span": {
                    "begin": 53,
                    "end": 58
                }
            },
            {
                "id": [
                    "MESH:C567763",
                    "BERN:262813101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 133,
                    "end": 146
                }
            },
            {
                "id": [
                    "MESH:D014652",
                    "BERN:256572101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 158,
                    "end": 174
                }
            },
            {
                "id": [
                    "MESH:C567763",
                    "BERN:262813101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 193,
                    "end": 231
                }
            },
            {
                "id": [
                    "MESH:C567763",
                    "BERN:262813101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 234,
                    "end": 288
                }
            },
            {
                "id": [
                    "MESH:C567763",
                    "BERN:262813101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 589,
                    "end": 593
                }
            },
            {
                "id": [
                    "MIM:171834",
                    "HGNC:8975",
                    "Ensembl:ENSG00000121879",
                    "BERN:324295302"
                ],
                "obj": "gene",
                "span": {
                    "begin": 748,
                    "end": 758
                }
            },
            {
                "id": [
                    "CUI-less"
                ],
                "mutationType": "ProteinMutation",
                "normalizedName": "p.F83S;CorrespondingGene:5290",
                "obj": "mutation",
                "span": {
                    "begin": 857,
                    "end": 866
                }
            },
            {
                "id": [
                    "BERN:257523801"
                ],
                "obj": "disease",
                "span": {
                    "begin": 906,
                    "end": 928
                }
            },
            {
                "id": [
                    "CUI-less"
                ],
                "obj": "gene",
                "span": {
                    "begin": 1009,
                    "end": 1024
                }
            },
            {
                "id": [
                    "MESH:C567763",
                    "BERN:262813101"
                ],
                "obj": "disease",
                "span": {
                    "begin": 1043,
                    "end": 1047
                }
            }
        ],
        "elapsed_time": {
            "ner": 0.611,
            "normalization": 0.218,
            "tmtool": 1.281,
            "total": 2.111
        },
        "project": "BERN",
        "sourcedb": "PubMed",
        "sourceid": "29446767",
        "text": "CLAPO syndrome: identification of somatic activating PIK3CA mutations and delineation of the natural history and phenotype. PURPOSE: CLAPO syndrome is a rare vascular disorder characterized by capillary malformation of the lower lip, lymphatic malformation predominant on the face and neck, asymmetry, and partial/generalized overgrowth. Here we tested the hypothesis that, although the genetic cause is not known, the tissue distribution of the clinical manifestations in CLAPO seems to follow a pattern of somatic mosaicism. METHODS: We clinically evaluated a cohort of 13 patients with CLAPO and screened 20 DNA blood/tissue samples from 9 patients using high-throughput, deep sequencing. RESULTS: We identified five activating mutations in the PIK3CA gene in affected tissues from 6 of the 9 patients studied; one of the variants (NM_006218.2:c.248T>C; p.Phe83Ser) has not been previously described in developmental disorders. CONCLUSION: We describe for the first time the presence of somatic activating PIK3CA mutations in patients with CLAPO. We also report an update of the phenotype and natural history of the syndrome.",
        "timestamp": "Thu Jul 04 06:15:27 +0000 2019"
    }
]

Restart

# Start GNormPlusServer
cd ~/bern/GNormPlusJava
nohup java -Xmx16G -Xms16G -jar GNormPlusServer.jar 18895 >> ~/bern/logs/nohup_gnormplus.out 2>&1 &

# Start tmVar2Server
cd ~/bern/tmVarJava
nohup java -Xmx8G -Xms8G -jar tmVar2Server.jar 18896 >> ~/bern/logs/nohup_tmvar.out 2>&1 &

# Start normalizers
cd ~/bern/
sh load_dicts.sh

# Check your GPU number(s)
echo $CUDA_VISIBLE_DEVICES

# Set your GPU number(s)
export CUDA_VISIBLE_DEVICES=0

# Run BERN
nohup python3 -u server.py --port 8888 --gnormplus_home ~/bern/GNormPlusJava --gnormplus_port 18895 --tmvar2_home ~/bern/tmVarJava --tmvar2_port 18896 >> logs/nohup_BERN.out 2>&1 &

# Print logs
tail -F logs/nohup_BERN.out

Troubleshooting

Monitoring

  • List processes (every 5s)
watch -n 5 "ps auxww | egrep 'python|java|node' | grep -v grep"
  • Periodic HTTPS GET checker

    • Permission setting
    chmod +x scripts/bern_checker.sh
    
    • crontab (every 30 min)
    crontab -e
    */30 * * * * /home/<YOUR_ACCOUNT>/bern/scripts/bern_checker.sh >> /home/<YOUR_ACCOUNT>/bern/logs/bern_checker.out 2>&1
    

Bug report

Add a new issue to https://github.com/dmis-lab/bern/issues

Contact

[email protected]

Citation

  • Please cite the following two papers if you use BERN on your work.
@article{kim2019neural,
  title={A Neural Named Entity Recognition and Multi-Type Normalization Tool for Biomedical Text Mining},
  author={Kim, Donghyeon and Lee, Jinhyuk and So, Chan Ho and Jeon, Hwisang and Jeong, Minbyul and Choi, Yonghwa and Yoon, Wonjin and Sung, Mujeen and and Kang, Jaewoo},
  journal={IEEE Access},
  volume={7},
  pages={73729--73740},
  year={2019},
  publisher={IEEE}
}

@article{10.1093/bioinformatics/btz682,
    author = {Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and Kim, Sunkyu and So, Chan Ho and Kang, Jaewoo},
    title = "{BioBERT: a pre-trained biomedical language representation model for biomedical text mining}",
    journal = {Bioinformatics},
    year = {2019},
    month = {09},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btz682},
    url = {https://doi.org/10.1093/bioinformatics/btz682},
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].