All Projects → thunlp → COVID19-IRQA

thunlp / COVID19-IRQA

Licence: MIT license
No description or website provided.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects
Batchfile
5799 projects

Projects that are alternatives of or similar to COVID19-IRQA

HAR
Code for WWW2019 paper "A Hierarchical Attention Retrieval Model for Healthcare Question Answering"
Stars: ✭ 22 (-31.25%)
Mutual labels:  information-retrieval, question-answering
Cdqa
⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
Stars: ✭ 500 (+1462.5%)
Mutual labels:  information-retrieval, question-answering
cherche
📑 Neural Search
Stars: ✭ 196 (+512.5%)
Mutual labels:  information-retrieval, question-answering
ProQA
Progressively Pretrained Dense Corpus Index for Open-Domain QA and Information Retrieval
Stars: ✭ 44 (+37.5%)
Mutual labels:  information-retrieval, question-answering
Flexneuart
Flexible classic and NeurAl Retrieval Toolkit
Stars: ✭ 99 (+209.38%)
Mutual labels:  information-retrieval, question-answering
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+487.5%)
Mutual labels:  information-retrieval, question-answering
cdQA-ui
⛔ [NOT MAINTAINED] A web interface for cdQA and other question answering systems.
Stars: ✭ 19 (-40.62%)
Mutual labels:  information-retrieval, question-answering
Awesome Neural Models For Semantic Match
A curated list of papers dedicated to neural text (semantic) matching.
Stars: ✭ 669 (+1990.63%)
Mutual labels:  information-retrieval, question-answering
Bert Vietnamese Question Answering
Vietnamese question answering system with BERT
Stars: ✭ 57 (+78.13%)
Mutual labels:  information-retrieval, question-answering
Knowledge Graphs
A collection of research on knowledge graphs
Stars: ✭ 845 (+2540.63%)
Mutual labels:  information-retrieval, question-answering
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (+287.5%)
Mutual labels:  information-retrieval, question-answering
Haystack
🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: ✭ 3,409 (+10553.13%)
Mutual labels:  information-retrieval, question-answering
FinBERT-QA
Financial Domain Question Answering with pre-trained BERT Language Model
Stars: ✭ 70 (+118.75%)
Mutual labels:  information-retrieval, question-answering
COVID breakdown
COVID-19 statistics in Taiwan
Stars: ✭ 15 (-53.12%)
Mutual labels:  covid-19
coronablocker
🦠 Chrome extension to block any news about Covid-19 on social networks.
Stars: ✭ 30 (-6.25%)
Mutual labels:  covid-19
covid19-tracker
📱 Tracking the impact of COVID-19 cases based on your location, built in Flutter
Stars: ✭ 34 (+6.25%)
Mutual labels:  covid-19
vaccine-alarm
Check for Vaccine availability in a district at specified intervals and sounds a loud alarm when a slot is available.
Stars: ✭ 22 (-31.25%)
Mutual labels:  covid-19
oscovida
Explore COVID19 case numbers and deaths related to Coronavirus outbreak 2019/2020 in Pandas and in Jupyter notebook with MyBinder
Stars: ✭ 33 (+3.13%)
Mutual labels:  covid-19
GEANet-BioMed-Event-Extraction
Code for the paper Biomedical Event Extraction with Hierarchical Knowledge Graphs
Stars: ✭ 52 (+62.5%)
Mutual labels:  covid-19
FaceMaskDetector
Real time face-mask detection using Deep Learning and OpenCV
Stars: ✭ 106 (+231.25%)
Mutual labels:  covid-19

IR and QA Pipeline System for COVID-19

The repository is organized by THUNLP and Microsoft AI. It contains an ongoing work of an IR and QA pipeline system towards the novel coronavirus COVID-19 (SARS-CoV-2). This system is trained with MS-MARCO, a large scale reading comprehension dataset, and directly transferred to the medical area. We hope this repository will help us work together against the COVID-19.

COVID Dataset

The CORD-19 resource is constructed by Semantic Scholar of Allen Institute and will continue to be updated as new research is published in archival services and peer-reviewed publications. The shared task on Kaggle aims to help specialists in virusology, pharmacy and microbiology to find answers to the problem.

IR and QA Pipeline

Document Retrieval

The following models are implemented for an effective document retrieval system.

  • BM25
  • Approximate Nearest Neighbor (ANN)

Paragraph Retrieval

  • BERT (Base version of BERT with 12 layers)
  • Distilled BERT (BERT with 3 layers)

QA System

  • BERT (Base version)

Keyphrase Extraction

Running Systems

Downloading and unzipping checkpoints, data and index files into models and retrieval folders, respectively. You can find all resource on Tsinghua Cloud and Google Drive. Then install required packages.

Build BM25 Index using anserini. Download link of collections are available in data.

./indexer/bm25_indexer/bin/IndexCollection -collection JsonCollection -es -es.index cord19 -input collection -generator LuceneDocumentGenerator -threads 1 -storePositions -storeDocvectors -storeRawDocs
pip install -r requirements.txt

Setting the CUDA device.

export CUDA_VISIBLE_DEVICES=DEVICE_ID

Running this pipeline system with the basic instruction. BM25 document retrieval, BERT paragraph retrieval and BERT QA model.

python run_pipeline.py

Using ANN in document retrieval.

python run_pipeline.py --use_ann

Using Distilled BERT in paragraph retrieval.

python run_pipeline.py --ranking_model_path ./models/bert_ranking_model_distilled

Keyphrase Extraction: the detailed giudes for generating keyphrases in the kpe folder.

Running Results

Search result is a list of top-k document information and each document contains following fileds

  • "title": Document title
  • "keyphrases": Extracted keyphrases
  • "text": Document text

QA results is a list of top-k answers and each answer contains following fileds

  • "text": Answer text
  • "title": The document tile where the answer is from

Contribution

The following people share the same contribution for this repository:

Aowei Lu, Jiahua Liu, Kaitao Zhang, Shi Yu, Si Sun, Zhenghao Liu

Project Organizers

  • Chenyan Xiong
    • Microsoft Research AI, Redmond, USA
    • Homepage
  • Zhiyuan Liu
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].