All Projects → som-shahlab → trove

som-shahlab / trove

Licence: Apache-2.0 license
Weakly supervised medical named entity classification

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to trove

Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Stars: ✭ 2,235 (+3963.64%)
Mutual labels:  text-classification, ner, bert
concept-based-xai
Library implementing state-of-the-art Concept-based and Disentanglement Learning methods for Explainable AI
Stars: ✭ 41 (-25.45%)
Mutual labels:  weak-supervision, weakly-supervised-learning
datagrand bert
2019达观杯信息提取第5名代码
Stars: ✭ 20 (-63.64%)
Mutual labels:  ner, bert
classifier multi label seq2seq attention
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search
Stars: ✭ 26 (-52.73%)
Mutual labels:  text-classification, bert
classifier multi label
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification
Stars: ✭ 127 (+130.91%)
Mutual labels:  text-classification, bert
Advances-in-Label-Noise-Learning
A curated (most recent) list of resources for Learning with Noisy Labels
Stars: ✭ 360 (+554.55%)
Mutual labels:  weakly-supervised-learning, learning-with-noisy-labels
weasel
Weakly Supervised End-to-End Learning (NeurIPS 2021)
Stars: ✭ 117 (+112.73%)
Mutual labels:  weak-supervision, weakly-supervised-learning
protonet-bert-text-classification
finetune bert for small dataset text classification in a few-shot learning manner using ProtoNet
Stars: ✭ 28 (-49.09%)
Mutual labels:  text-classification, bert
BERT-chinese-text-classification-pytorch
This repo contains a PyTorch implementation of a pretrained BERT model for text classification.
Stars: ✭ 92 (+67.27%)
Mutual labels:  text-classification, bert
backprop
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+316.36%)
Mutual labels:  text-classification, bert
tensorflow-ml-nlp-tf2
텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료
Stars: ✭ 245 (+345.45%)
Mutual labels:  ner, bert
neuro-comma
🇷🇺 Punctuation restoration production-ready model for Russian language 🇷🇺
Stars: ✭ 46 (-16.36%)
Mutual labels:  ner, bert
ChineseNER
中文NER的那些事儿
Stars: ✭ 241 (+338.18%)
Mutual labels:  ner, bert
ERNIE-text-classification-pytorch
This repo contains a PyTorch implementation of a pretrained ERNIE model for text classification.
Stars: ✭ 49 (-10.91%)
Mutual labels:  text-classification, bert
wrench
WRENCH: Weak supeRvision bENCHmark
Stars: ✭ 185 (+236.36%)
Mutual labels:  weak-supervision, weakly-supervised-learning
GEANet-BioMed-Event-Extraction
Code for the paper Biomedical Event Extraction with Hierarchical Knowledge Graphs
Stars: ✭ 52 (-5.45%)
Mutual labels:  biomedical, bert
WeSHClass
[AAAI 2019] Weakly-Supervised Hierarchical Text Classification
Stars: ✭ 83 (+50.91%)
Mutual labels:  text-classification, weakly-supervised-learning
Marktool
这是一款基于web的通用文本标注工具,支持大规模实体标注、关系标注、事件标注、文本分类、基于字典匹配和正则匹配的自动标注以及用于实现归一化的标准名标注,同时也支持文本的迭代标注和实体的嵌套标注。标注规范可自定义且同类型任务中可“一次创建多次复用”。通过分级实体集合扩大了实体类型的规模,并设计了全新高效的标注方式,提升了用户体验和标注效率。此外,本工具增加了审核环节,可对多人的标注结果进行一致性检验和调整,提高了标注语料的准确率和可靠性。
Stars: ✭ 190 (+245.45%)
Mutual labels:  text-classification, ner
Learning-From-Rules
Implementation of experiments in paper "Learning from Rules Generalizing Labeled Exemplars" to appear in ICLR2020 (https://openreview.net/forum?id=SkeuexBtDr)
Stars: ✭ 46 (-16.36%)
Mutual labels:  weak-supervision, weakly-supervised-learning
Kevinpro-NLP-demo
All NLP you Need Here. 个人实现了一些好玩的NLP demo,目前包含13个NLP应用的pytorch实现
Stars: ✭ 117 (+112.73%)
Mutual labels:  text-classification, bert

Trove

Documentation Status license

Trove is a research framework for building weakly supervised (bio)medical named entity recognition (NER) and other entity attribute classifiers without hand-labeled training data.

The COVID-19 pandemic has underlined the need for faster, more flexible ways of building and sharing state-of-the-art NLP/NLU tools to analyze electronic health records, scientific literature, and social media. Likewise, recent research into language modeling and the dangers of uncurated, "unfathomably" large-scale training data underlines the broader need to approach training set creation itself with more transparency and rigour.

Trove provides tools for combining freely available supervision sources such as medical ontologies from the Unified Medical Language System (UMLS), common text heuristics, and other noisy labeling sources for use as entity labelers in weak supervision frameworks such as Snorkel, FlyingSquid and others. Technical details are available in our manuscript.

Trove has been used as part of several COVID-19 reseach efforts at Stanford.

Getting Started

Tutorials

See tutorials/ for Jupyter notebooks walking through an example NER application.

Installation

Requirements: Python 3.6 or later. We recomend using pip to install

pip install -r requirements.txt

Experiments

NER experiments from the manuscript are found here. We are in the process of refactoring these for easier usage.

Contributions

We welcome all contributions to the code base! Please submit a pull request and/or start a discussion on GitHub Issues.

Weakly supervised methods for programatically building and maintaining training sets provides new opportunities for the larger community to participate in the creation of important datasets. This is especially exciting in domains such as medicine, where sharing labeled data is often challening due to patient privacy concerns.

Inspired by recent efforts such as HuggingFace's Datasets library, we would love to start a conversation around how to support sharing labelers in service of mantaining an open task library, so that it is easier to create, deploy, and version control weakly supervised models.

Citation

If use Trove in your research, please cite us!

Fries, J.A., Steinberg, E., Khattar, S. et al. Ontology-driven weak supervision for clinical entity classification in electronic health records. Nat Commun 12, 2017 (2021). https://doi.org/10.1038/s41467-021-22328-4

@article{fries2021trove,
  title={Ontology-driven weak supervision for clinical entity classification in electronic health records},
  author={Fries, Jason A and Steinberg, Ethan and Khattar, Saelig and Fleming, Scott L and Posada, Jose and Callahan, Alison and Shah, Nigam H},
  journal={Nature Communications},
  volume={12},
  number={1},
  year={2021},
  publisher={Nature Publishing Group}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].