All Projects → PlusLabNLP → GEANet-BioMed-Event-Extraction

PlusLabNLP / GEANet-BioMed-Event-Extraction

Licence: MIT license
Code for the paper Biomedical Event Extraction with Hierarchical Knowledge Graphs

Programming Languages

python
139335 projects - #7 most used programming language
perl
6916 projects
Jupyter Notebook
11667 projects
shell
77523 projects

Projects that are alternatives of or similar to GEANet-BioMed-Event-Extraction

JointIDSF
BERT-based joint intent detection and slot filling with intent-slot attention mechanism (INTERSPEECH 2021)
Stars: ✭ 55 (+5.77%)
Mutual labels:  bert, multitask-learning
trove
Weakly supervised medical named entity classification
Stars: ✭ 55 (+5.77%)
Mutual labels:  biomedical, bert
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-53.85%)
Mutual labels:  bert, covid-19
ChineseNER
中文NER的那些事儿
Stars: ✭ 241 (+363.46%)
Mutual labels:  bert, multitask-learning
CoronaXiv
First Prize in HackJaipur Hackathon 2020 for Best ElasticSearch-based Product! Website: http://coronaxiv2.surge.sh/#/
Stars: ✭ 15 (-71.15%)
Mutual labels:  bert, covid-19
OpenUE
OpenUE是一个轻量级知识图谱抽取工具 (An Open Toolkit for Universal Extraction from Text published at EMNLP2020: https://aclanthology.org/2020.emnlp-demos.1.pdf)
Stars: ✭ 274 (+426.92%)
Mutual labels:  event-extraction, bert
vue-covid
VueJS client untuk menampilkan data Covid19
Stars: ✭ 16 (-69.23%)
Mutual labels:  covid-19
vaccine-alarm
Check for Vaccine availability in a district at specified intervals and sounds a loud alarm when a slot is available.
Stars: ✭ 22 (-57.69%)
Mutual labels:  covid-19
COVID-19-tracker
北航大数据高精尖中心研究团队进行数据来源的整理与获取,利用自然语言处理等技术从已公开全国4626确诊患者轨迹中抽取了基本信息(性别、年龄、常住地、工作、武汉/湖北接触史等)、轨迹(时间、地点、交通工具、事件)及病患关系形成结构化信息
Stars: ✭ 75 (+44.23%)
Mutual labels:  covid-19
brazil-civil-registry-data
Raw scrapings of ARPEN https://transparencia.registrocivil.org.br/
Stars: ✭ 35 (-32.69%)
Mutual labels:  covid-19
COVID-CT-MD
A COVID-19 CT Scan Dataset Applicable in Machine Learning and Deep Learning
Stars: ✭ 22 (-57.69%)
Mutual labels:  covid-19
COVID-away
Repo of paper title 'Avoid touching your face: A hand-to-face 3d motion dataset (covid-away) and trained models for smartwatches'
Stars: ✭ 18 (-65.38%)
Mutual labels:  covid-19
datagrand bert
2019达观杯信息提取第5名代码
Stars: ✭ 20 (-61.54%)
Mutual labels:  bert
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Stars: ✭ 738 (+1319.23%)
Mutual labels:  bert
covid19-tracker
📱 Tracking the impact of COVID-19 cases based on your location, built in Flutter
Stars: ✭ 34 (-34.62%)
Mutual labels:  covid-19
covid-19
An app made with Flutter to track COVID-19 case counts.
Stars: ✭ 47 (-9.62%)
Mutual labels:  covid-19
bert extension tf
BERT Extension in TensorFlow
Stars: ✭ 29 (-44.23%)
Mutual labels:  bert
impfbot
Benachrichtigungs-Bot für das niedersächische Impfportal / Notification bot for the lower saxony vaccination portal https://impfportal-niedersachsen.de
Stars: ✭ 37 (-28.85%)
Mutual labels:  covid-19
CoWin-Vaccine-Notifier
Automated Python Script to retrieve vaccine slots availability and get notified when a slot is available.
Stars: ✭ 102 (+96.15%)
Mutual labels:  covid-19
MMCAcovid19.jl
Microscopic Markov Chain Approach to model the spreading of COVID-19
Stars: ✭ 15 (-71.15%)
Mutual labels:  covid-19

PWC

Biomedical Event Extraction with Hierarchical Knowledge Graphs

Introduction

This repo hosts the code for the paper Biomedical Event Extraction with Hierarchical Knowledge Graphs. We represent knowledge from UMLS in hierarchical knowledge graphs, and integrate knowledge into pre-trained contextual representations with a proposed graph neural networks, Graph Edge-conditioned Attention Neural Networks (GEANet). Currently, only SciBERT Baseline is available, and our best performing model GEANet-SciBERT will soon be released.

Model Dev Set F1 Test Set F1
SciBERT Baseline 59.33 58.50
GEANet-SciBERT 60.38 60.06
Previous SOTA N/A 58.65

CORD-19

With the increasing concern about the COVID-19 pandemic, researchers have been putting much effort in providing useful insights into the COVID-19 Open Research Dataset Challenge (CORD-19). This repo also demonstrates how we extract biomedical events with SciBERT, a BERT trained on scientific corpus, which was fine-tuned on the GENIA BioNLP shared task 2011. We took reference from the pipeline described in Bjorne et al., where the pipeline can be broken into 3 stages: trigger detection, edge/argument detection and unmerging. The extracted events can be found here.

In addition, we adopted the framework from Han et al., where trigger and edge detections are trained in a multitask setting.

Dependencies

Our experiments were ran with Python 3.6.10, PyTorch 1.4.0 and CUDA 10.1 on a CentOS machine. Please follow the instructions of GPU dependencies and usage as illustrated in the official PyTorch website. We utilized the NER model en_ner_jnlpba_md provided by ScispaCy for tagging biomedical entities.

Run pip install -r requirements to install all the required packages.

(Optional) If you would like to use the knowledge incorporation component, additional packages are required for Pytorch Geometric can be installed as follows:

$ pip install torch-scatter==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-sparse==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-cluster==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-spline-conv==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-1.4.0.html

, where ${CUDA} should be replaced by your installed CUDA version. (e.g. cpu, cu92, cu101, cu102)

The trained model can be downloaded from here. You need to unzip decompress it before running the program.

Making predictions

On the CORD-19 dataset

  1. Run through Preprocess CORD.ipynb to generate the processed .txt files to genia_cord_19 from the json files in the custom_license/custom_license/pmc_json/ directory.
  2. Type . run_multitask_bert.sh in the terminal to run the whole event extraction pipeline and generate event annotations into genia_cord_19_output.

On a single biomedical sentence

Call the biomedical_evet_extraction function in the predict.py script. This function takes in a single string as parameter. The event extraction pipeline will be run on this string.

>>> biomedical_evet_extraction("BMP-6 inhibits growth of mature human B cells; induction of Smad phosphorylation and upregulation of Id1.")
>>> [{'tokens': ['BMP-6', 'inhibits', 'growth', 'of', 'mature', 'human', 'B', 'cells', ';', 'induction', 'of', 'Smad', 'phosphorylation', 'and', 'upregulation', 'of', 'Id1', '.'], 'events': [{'event_type': 'Positive_regulation', 'triggers': [{'event_type': 'Positive_regulation', 'text': 'upregulation', 'start_token': 14, 'end_token': 14}], 'arguments': [{'role': 'Theme', 'text': 'Id1', 'start_token': 16, 'end_token': 16}]}], 'ner': [[0, 0, 'PROTEIN'], [4, 7, 'CELL_TYPE'], [11, 11, 'PROTEIN'], [16, 16, 'PROTEIN']]}]

Project Structure

├── genia_cord_19
├── genia_cord_19_output
├── preprocessed_data
├── eval
└── weights

Citation

@inproceedings{huang-etal-2020-biomedical,
    title = "Biomedical Event Extraction with Hierarchical Knowledge Graphs",
    author = "Huang, Kung-Hsiang  and
      Yang, Mu  and
      Peng, Nanyun",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.findings-emnlp.114",
    doi = "10.18653/v1/2020.findings-emnlp.114",
    pages = "1277--1285",
    abstract = "Biomedical event extraction is critical in understanding biomolecular interactions described in scientific corpus. One of the main challenges is to identify nested structured events that are associated with non-indicative trigger words. We propose to incorporate domain knowledge from Unified Medical Language System (UMLS) to a pre-trained language model via Graph Edge-conditioned Attention Networks (GEANet) and hierarchical graph representation. To better recognize the trigger words, each sentence is first grounded to a sentence graph based on a jointly modeled hierarchical knowledge graph from UMLS. The grounded graphs are then propagated by GEANet, a novel graph neural networks for enhanced capabilities in inferring complex events. On BioNLP 2011 GENIA Event Extraction task, our approach achieved 1.41{\%} F1 and 3.19{\%} F1 improvements on all events and complex events, respectively. Ablation studies confirm the importance of GEANet and hierarchical KG.",
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].