All Projects → gsarti → Covid Papers Browser

gsarti / Covid Papers Browser

Licence: gpl-2.0
Browse Covid-19 & SARS-CoV-2 Scientific Papers with Transformers 🦠 📖

Projects that are alternatives of or similar to Covid Papers Browser

Awesome Search
Awesome Search - this is all about the (e-commerce) search and its awesomeness
Stars: ✭ 361 (+124.22%)
Mutual labels:  search-engine, natural-language-processing
Bertsearch
Elasticsearch with BERT for advanced document search.
Stars: ✭ 684 (+324.84%)
Mutual labels:  search-engine, natural-language-processing
Covid 19 Bert Researchpapers Semantic Search
BERT semantic search engine for searching literature research papers for coronavirus covid-19 in google colab
Stars: ✭ 23 (-85.71%)
Mutual labels:  search-engine, natural-language-processing
Rnn lstm from scratch
How to build RNNs and LSTMs from scratch with NumPy.
Stars: ✭ 156 (-3.11%)
Mutual labels:  natural-language-processing
Speech signal processing and classification
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
Stars: ✭ 155 (-3.73%)
Mutual labels:  natural-language-processing
Nlpre
Python library for Natural Language Preprocessing (NLPre)
Stars: ✭ 158 (-1.86%)
Mutual labels:  natural-language-processing
Nlp bahasa resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (-1.86%)
Mutual labels:  natural-language-processing
Deeplearning nlp
基于深度学习的自然语言处理库
Stars: ✭ 154 (-4.35%)
Mutual labels:  natural-language-processing
Pytorch Nlp
Basic Utilities for PyTorch Natural Language Processing (NLP)
Stars: ✭ 1,996 (+1139.75%)
Mutual labels:  natural-language-processing
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+7827.33%)
Mutual labels:  natural-language-processing
Awesome Nlp
📖 A curated list of resources dedicated to Natural Language Processing (NLP)
Stars: ✭ 12,626 (+7742.24%)
Mutual labels:  natural-language-processing
Holiday Cn
📅🇨🇳 中国法定节假日数据 自动每日抓取国务院公告
Stars: ✭ 157 (-2.48%)
Mutual labels:  natural-language-processing
Sf1r Lite
Search Formula-1——A distributed high performance massive data engine for enterprise/vertical search
Stars: ✭ 158 (-1.86%)
Mutual labels:  search-engine
Swagaf
Repository for paper "SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference"
Stars: ✭ 156 (-3.11%)
Mutual labels:  natural-language-processing
Mixtext
MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification
Stars: ✭ 159 (-1.24%)
Mutual labels:  natural-language-processing
Pythonrouge
Python wrapper for evaluating summarization quality by ROUGE package
Stars: ✭ 155 (-3.73%)
Mutual labels:  natural-language-processing
Mtbook
《机器翻译:基础与模型》肖桐 朱靖波 著 - Machine Translation: Foundations and Models
Stars: ✭ 2,307 (+1332.92%)
Mutual labels:  natural-language-processing
Awesome Pytorch List
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
Stars: ✭ 12,475 (+7648.45%)
Mutual labels:  natural-language-processing
Sling
SLING - A natural language frame semantics parser
Stars: ✭ 1,892 (+1075.16%)
Mutual labels:  natural-language-processing
Tis Solr
an enterprise search engine base on Apache Solr
Stars: ✭ 158 (-1.86%)
Mutual labels:  search-engine

Covid-19 Semantic Browser: Browse Covid-19 & SARS-CoV-2 Scientific Papers with Transformers 🦠 📖

Covid-19 Semantic Browser is an interactive experimental tool leveraging a state-of-the-art language model to search relevant content inside the COVID-19 Open Research Dataset (CORD-19) recently published by the White House and its research partners. The dataset contains over 44,000 scholarly articles about COVID-19, SARS-CoV-2 and related coronaviruses.

Various models already fine-tuned on Natural Language Inference are available to perform the search:

All models are trained on SNLI [3] and MultiNLI [4] using the sentence-transformers library [5] to produce universal sentence embeddings [6]. Embeddings are subsequently used to perform semantic search on CORD-19.

Currently supported operations are:

  • Browse paper abstract with interactive queries.

  • Reproduce SciBERT-NLI, BioBERT-NLI and CovidBERT-NLI training results.

Setup

Python 3.6 or higher is required to run the code. First, install the required libraries with pip, then download the en_core_web_sm language pack for spaCy and data for NLTK:

pip install -r requirements.txt
python -m spacy download en_core_web_sm
python -m nltk.downloader punkt

Using the Browser

First of all, download a model fine-tuned on NLI from HuggingFace's cloud repository.

python scripts/download_model.py --model scibert-nli

Second, download the data from the Kaggle challenge page and place it in the data folder.

Finally, simply run:

python scripts/interactive_search.py

to enter the interactive demo. Using a GPU is suggested since the creation of the embeddings for the entire corpus might be time-consuming otherwise. Both the corpus and the embeddings are cached on disk after the first execution of the script, and execution is really fast after embeddings are computed.

Use the interactive demo as follows:

Demo GIF

Reproducing Training Results for Transformers

First, download a pretrained model from HuggingFace's cloud repository.

python scripts/download_model.py --model scibert

Second, download the NLI datasets used for training and the STS dataset used for testing.

python scripts/get_finetuning_data.py

Finally, run the finetuning script by adjusting the parameters depending on the model you intend to train (default is scibert-nli).

python scripts/finetune_nli.py

The model will be evaluated against the test portion of the Semantic Text Similarity (STS) benchmark dataset at the end of training. Please refer to my model cards for additional references on parameter values.

References

[1] Beltagy et al. 2019, "SciBERT: Pretrained Language Model for Scientific Text"

[2] Lee et al. 2020, "BioBERT: a pre-trained biomedical language representation model for biomedical text mining"

[3] Bowman et al. 2015, "A large annotated corpus for learning natural language inference"

[4] Adina et al. 2018, "A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference"

[5] Reimers et al. 2019, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks"

[6] As shown in Conneau et al. 2017, "Supervised Learning of Universal Sentence Representations from Natural Language Inference Data"

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].