som-shahlab / trove

Licence: Apache-2.0 license

Weakly supervised medical named entity classification

Programming Languages

python

139335 projects - #7 most used programming language

Jupyter Notebook

11667 projects

Projects that are alternatives of or similar to trove

Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Stars: ✭ 2,235 (+3963.64%)

Mutual labels: text-classification, ner, bert

concept-based-xai

Library implementing state-of-the-art Concept-based and Disentanglement Learning methods for Explainable AI

Stars: ✭ 41 (-25.45%)

Mutual labels: weak-supervision, weakly-supervised-learning

datagrand bert

2019达观杯信息提取第5名代码

Stars: ✭ 20 (-63.64%)

Mutual labels: ner, bert

classifier multi label seq2seq attention

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

Stars: ✭ 26 (-52.73%)

Mutual labels: text-classification, bert

classifier multi label

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification

Stars: ✭ 127 (+130.91%)

Mutual labels: text-classification, bert

Advances-in-Label-Noise-Learning

A curated (most recent) list of resources for Learning with Noisy Labels

Stars: ✭ 360 (+554.55%)

Mutual labels: weakly-supervised-learning, learning-with-noisy-labels

weasel

Weakly Supervised End-to-End Learning (NeurIPS 2021)

Stars: ✭ 117 (+112.73%)

Mutual labels: weak-supervision, weakly-supervised-learning

protonet-bert-text-classification

finetune bert for small dataset text classification in a few-shot learning manner using ProtoNet

Stars: ✭ 28 (-49.09%)

Mutual labels: text-classification, bert

BERT-chinese-text-classification-pytorch

This repo contains a PyTorch implementation of a pretrained BERT model for text classification.

Stars: ✭ 92 (+67.27%)

Mutual labels: text-classification, bert

backprop

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (+316.36%)

Mutual labels: text-classification, bert

tensorflow-ml-nlp-tf2

텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료

Stars: ✭ 245 (+345.45%)

Mutual labels: ner, bert

neuro-comma

🇷🇺 Punctuation restoration production-ready model for Russian language 🇷🇺

Stars: ✭ 46 (-16.36%)

Mutual labels: ner, bert

ChineseNER

中文NER的那些事儿

Stars: ✭ 241 (+338.18%)

Mutual labels: ner, bert

ERNIE-text-classification-pytorch

This repo contains a PyTorch implementation of a pretrained ERNIE model for text classification.

Stars: ✭ 49 (-10.91%)

Mutual labels: text-classification, bert

wrench

WRENCH: Weak supeRvision bENCHmark

Stars: ✭ 185 (+236.36%)

Mutual labels: weak-supervision, weakly-supervised-learning

GEANet-BioMed-Event-Extraction

Code for the paper Biomedical Event Extraction with Hierarchical Knowledge Graphs

Stars: ✭ 52 (-5.45%)

Mutual labels: biomedical, bert

WeSHClass

[AAAI 2019] Weakly-Supervised Hierarchical Text Classification

Stars: ✭ 83 (+50.91%)

Mutual labels: text-classification, weakly-supervised-learning

Marktool

这是一款基于web的通用文本标注工具，支持大规模实体标注、关系标注、事件标注、文本分类、基于字典匹配和正则匹配的自动标注以及用于实现归一化的标准名标注，同时也支持文本的迭代标注和实体的嵌套标注。标注规范可自定义且同类型任务中可“一次创建多次复用”。通过分级实体集合扩大了实体类型的规模，并设计了全新高效的标注方式，提升了用户体验和标注效率。此外，本工具增加了审核环节，可对多人的标注结果进行一致性检验和调整，提高了标注语料的准确率和可靠性。

Stars: ✭ 190 (+245.45%)

Mutual labels: text-classification, ner

Learning-From-Rules

Implementation of experiments in paper "Learning from Rules Generalizing Labeled Exemplars" to appear in ICLR2020 (https://openreview.net/forum?id=SkeuexBtDr)

Stars: ✭ 46 (-16.36%)

Mutual labels: weak-supervision, weakly-supervised-learning

Kevinpro-NLP-demo

All NLP you Need Here. 个人实现了一些好玩的NLP demo，目前包含13个NLP应用的pytorch实现

Stars: ✭ 117 (+112.73%)

Mutual labels: text-classification, bert

View All Similar Projects ➔

Trove

Trove is a research framework for building weakly supervised (bio)medical named entity recognition (NER) and other entity attribute classifiers without hand-labeled training data.

The COVID-19 pandemic has underlined the need for faster, more flexible ways of building and sharing state-of-the-art NLP/NLU tools to analyze electronic health records, scientific literature, and social media. Likewise, recent research into language modeling and the dangers of uncurated, "unfathomably" large-scale training data underlines the broader need to approach training set creation itself with more transparency and rigour.

Trove provides tools for combining freely available supervision sources such as medical ontologies from the Unified Medical Language System (UMLS), common text heuristics, and other noisy labeling sources for use as entity labelers in weak supervision frameworks such as Snorkel, FlyingSquid and others. Technical details are available in our manuscript.

Trove has been used as part of several COVID-19 reseach efforts at Stanford.

Continuous symptom profiling of patients screened for SARS-CoV-2. We used a daily feed of patient notes from Stanford Health Care emergency departments to generate up-to-date COVID-19 symptom frequency data. Funded by the Bill & Melinda Gates Foundation.
Estimating the efficacy of symptom-based screening for COVID-19 published in npj Digitial Medicine.
Our COVID-19 symptom data was used by CMU's DELPHI group to prioritize selection of informative features from Google's Symptom Search Trends dataset.

Getting Started

Tutorials

See tutorials/ for Jupyter notebooks walking through an example NER application.

Installation

Requirements: Python 3.6 or later. We recomend using pip to install

pip install -r requirements.txt

Experiments

NER experiments from the manuscript are found here. We are in the process of refactoring these for easier usage.

Contributions

We welcome all contributions to the code base! Please submit a pull request and/or start a discussion on GitHub Issues.

Weakly supervised methods for programatically building and maintaining training sets provides new opportunities for the larger community to participate in the creation of important datasets. This is especially exciting in domains such as medicine, where sharing labeled data is often challening due to patient privacy concerns.

Inspired by recent efforts such as HuggingFace's Datasets library, we would love to start a conversation around how to support sharing labelers in service of mantaining an open task library, so that it is easier to create, deploy, and version control weakly supervised models.

Citation

If use Trove in your research, please cite us!

Fries, J.A., Steinberg, E., Khattar, S. et al. Ontology-driven weak supervision for clinical entity classification in electronic health records. Nat Commun 12, 2017 (2021). https://doi.org/10.1038/s41467-021-22328-4

@article{fries2021trove,
  title={Ontology-driven weak supervision for clinical entity classification in electronic health records},
  author={Fries, Jason A and Steinberg, Ethan and Khattar, Saelig and Fleming, Scott L and Posada, Jose and Callahan, Alison and Shah, Nigam H},
  journal={Nature Communications},
  volume={12},
  number={1},
  year={2021},
  publisher={Nature Publishing Group}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

som-shahlab / trove

Programming Languages

Labels

Projects that are alternatives of or similar to trove

Trove

Getting Started

Tutorials

Installation

Experiments

Contributions

Citation