jind11 / PubMed-PICO-Detection

Licence: other

PubMed PICO Element Detection Dataset

Projects that are alternatives of or similar to PubMed-PICO-Detection

text-classification-cn

中文文本分类实践，基于搜狗新闻语料库，采用传统机器学习方法以及预训练模型等方法

Stars: ✭ 81 (+118.92%)

Mutual labels: corpus

PoetryCorpus

Поэтический корпус русского языка

Stars: ✭ 40 (+8.11%)

Mutual labels: corpus

cljs-corpus

A greppable archive of ClojureScript code

Stars: ✭ 37 (+0%)

Mutual labels: corpus

TV4Dialog

No description or website provided.

Stars: ✭ 33 (-10.81%)

Mutual labels: corpus

CBLUE

中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Stars: ✭ 379 (+924.32%)

Mutual labels: corpus

CLUEmotionAnalysis2020

CLUE Emotion Analysis Dataset 细粒度情感分析数据集

Stars: ✭ 3 (-91.89%)

Mutual labels: corpus

thaigov-corpus

โครงการเก็บรวบรวมข่าวสารจากเว็บไซต์รัฐบาลไทย

Stars: ✭ 19 (-48.65%)

Mutual labels: corpus

thai-language

computer tools for thai language

Stars: ✭ 20 (-45.95%)

Mutual labels: corpus

pdf-corpus

Python script to quickly create hand-crafted PDF files

Stars: ✭ 17 (-54.05%)

Mutual labels: corpus

KAREN

KAREN: Unifying Hatespeech Detection and Benchmarking

Stars: ✭ 18 (-51.35%)

Mutual labels: sentence-classification

Customer-Feedback-Analysis

Multi Class Text (Feedback) Classification using CNN, GRU Network and pre trained Word2Vec embedding, word embeddings on TensorFlow.

Stars: ✭ 18 (-51.35%)

Mutual labels: sentence-classification

egret-wenda-corpus

A Public Corpus for Machine Learning

Stars: ✭ 41 (+10.81%)

Mutual labels: corpus

bible-corpus

A multilingual parallel corpus created from translations of the Bible.

Stars: ✭ 115 (+210.81%)

Mutual labels: corpus

LanguageCodes

We present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).

Stars: ✭ 70 (+89.19%)

Mutual labels: corpus

named-entity-recognition-template

Build a deep learning model for predicting the named entities from text.

Stars: ✭ 51 (+37.84%)

Mutual labels: corpus

kanji-frequency

Kanji usage frequency data collected from various sources

Stars: ✭ 92 (+148.65%)

Mutual labels: corpus

CNN-Sentence-Classification

A tensorflow implementation of Convolutional Neural Networks for Sentence Classification

Stars: ✭ 77 (+108.11%)

Mutual labels: sentence-classification

folia

FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…

Stars: ✭ 56 (+51.35%)

Mutual labels: corpus

NSP-BERT

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Stars: ✭ 166 (+348.65%)

Mutual labels: sentence-classification

KWDLC

Kyoto University Web Document Leads Corpus

Stars: ✭ 64 (+72.97%)

Mutual labels: corpus

View All Similar Projects ➔

PubMed PICO Element Detection Dataset

This dataset is introduced by Jin, Di, and Peter Szolovits. "PICO Element Detection in Medical Text via Long Short-Term Memory Neural Networks." Proceedings of the BioNLP 2018 workshop. 2018..

Abstract

Successful evidence-based medicine (EBM) applications rely on answering clinical questions by analyzing large medical literature databases. In order to formulate a well-defined, focused clinical question, a framework called PICO is widely used, which identifies the sentences in a given medical text that belong to the four components: Participants/Problem (P), Intervention (I), Comparison (C) and Outcome (O). In this work, we present a Long Short-Term Memory (LSTM) neural network based model to automatically detect PICO elements. By jointly classifying subsequent sentences in the given text, we achieve state-of-the-art results on PICO element classification compared to several strong baseline models. We also make our curated data public as a benchmarking dataset so that the community can benefit from it.

Some miscellaneous information:

structured_abstracts_PICO contains the original abstracts. The line that starts with ### indicates the PMID. After that line, each line contains the original section heading, the assgined gold label for train and test and the section content, separated by the symbol |. To create the gold label, key words in the section heading are checked and the mapping rule can be referred to the paper above-mentioned.
structured_abstracts_sentences_PICO is almost the same as structured_abstracts_PICO except that each section conent is sentence splitted using the Stanford CoreNLP toolkit so that each line has only one sentence and all numbers have been replaced by @.
The folder splitted contains the train, validation and test sets that are randomly splitted from the file structured_abstracts_sentences_PICO at the ratio of 8:1:1.

You are most welcome to share with us your analyses or work using this dataset by citing our paper!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

jind11 / PubMed-PICO-Detection

Labels

Projects that are alternatives of or similar to PubMed-PICO-Detection

PubMed PICO Element Detection Dataset