All Projects → stanfordmlgroup → CheXbert

stanfordmlgroup / CheXbert

Licence: other
Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to CheXbert

eye-tracker-setup
👀 Tobii Eye Tracker 4C Setup
Stars: ✭ 24 (-52.94%)
Mutual labels:  medical-imaging, radiology
monai-deploy
MONAI Deploy aims to become the de-facto standard for developing, packaging, testing, deploying and running medical AI applications in clinical production.
Stars: ✭ 56 (+9.8%)
Mutual labels:  medical-imaging, radiology
bert attn viz
Visualize BERT's self-attention layers on text classification tasks
Stars: ✭ 41 (-19.61%)
Mutual labels:  bert
korpatbert
특허분야 특화된 한국어 AI언어모델 KorPatBERT
Stars: ✭ 48 (-5.88%)
Mutual labels:  bert
rasa-bert-finetune
支持rasa-nlu 的bert finetune
Stars: ✭ 46 (-9.8%)
Mutual labels:  bert
datagrand bert
2019达观杯信息提取第5名代码
Stars: ✭ 20 (-60.78%)
Mutual labels:  bert
ExpBERT
Code for our ACL '20 paper "Representation Engineering with Natural Language Explanations"
Stars: ✭ 28 (-45.1%)
Mutual labels:  bert
LAMB Optimizer TF
LAMB Optimizer for Large Batch Training (TensorFlow version)
Stars: ✭ 119 (+133.33%)
Mutual labels:  bert
Machine-Learning-in-Medical-Imaging--U-Net
TUM_MLMI_SS16: Convolutional Neural Network using U-Net architecture to predict one modality of a brain MRI scan from another modality.
Stars: ✭ 22 (-56.86%)
Mutual labels:  medical-imaging
CS-Net
CS-Net (MICCAI 2019) and CS2-Net (MedIA 2020)
Stars: ✭ 53 (+3.92%)
Mutual labels:  medical-imaging
BERT-chinese-text-classification-pytorch
This repo contains a PyTorch implementation of a pretrained BERT model for text classification.
Stars: ✭ 92 (+80.39%)
Mutual labels:  bert
GEANet-BioMed-Event-Extraction
Code for the paper Biomedical Event Extraction with Hierarchical Knowledge Graphs
Stars: ✭ 52 (+1.96%)
Mutual labels:  bert
GateContrib
User-oriented public repository of Gate (macros, examples and user contributions)
Stars: ✭ 57 (+11.76%)
Mutual labels:  medical-imaging
TabFormer
Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)
Stars: ✭ 209 (+309.8%)
Mutual labels:  bert
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Stars: ✭ 738 (+1347.06%)
Mutual labels:  bert
subpixel-embedding-segmentation
PyTorch Implementation of Small Lesion Segmentation in Brain MRIs with Subpixel Embedding (ORAL, MICCAIW 2021)
Stars: ✭ 22 (-56.86%)
Mutual labels:  medical-imaging
BERTOverflow
A Pre-trained BERT on StackOverflow Corpus
Stars: ✭ 40 (-21.57%)
Mutual labels:  bert
R-AT
Regularized Adversarial Training
Stars: ✭ 19 (-62.75%)
Mutual labels:  bert
BERT-QE
Code and resources for the paper "BERT-QE: Contextualized Query Expansion for Document Re-ranking".
Stars: ✭ 43 (-15.69%)
Mutual labels:  bert
TriB-QA
吹逼我们是认真的
Stars: ✭ 45 (-11.76%)
Mutual labels:  bert

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

CheXbert is an accurate, automated deep-learning based chest radiology report labeler that can label for the following 14 medical observations: Fracture, Consolidation, Enlarged Cardiomediastinum, No Finding, Pleural Other, Cardiomegaly, Pneumothorax, Atelectasis, Support Devices, Edema, Pleural Effusion, Lung Lesion, Lung Opacity

Paper (Accepted to EMNLP 2020): https://arxiv.org/abs/2004.09167

License from us (For Commercial Purposes): http://techfinder2.stanford.edu/technology_detail.php?ID=43869

Abstract

The extraction of labels from radiology text reports enables large-scale training of medical imaging models. Existing approaches to report labeling typically rely either on sophisticated feature engineering based on medical domain knowledge or manual annotations by experts. In this work, we introduce a BERT-based approach to medical image report labeling that exploits both the scale of available rule-based systems and the quality of expert annotations. We demonstrate superior performance of a biomedically pretrained BERT model first trained on annotations of a rulebased labeler and then finetuned on a small set of expert annotations augmented with automated backtranslation. We find that our final model, CheXbert, is able to outperform the previous best rules-based labeler with statistical significance, setting a new SOTA for report labeling on one of the largest datasets of chest x-rays.

The CheXbert approach

Prerequisites

(Recommended) Install requirements, with Python 3.7 or higher, using pip.

pip install -r requirements.txt

OR

Create conda environment

conda env create -f environment.yml

Activate environment

conda activate chexbert

By default, all available GPU's will be used for labeling in parallel. If there is no GPU, the CPU is used. You can control which GPU's are used by appropriately setting CUDA_VISIBLE_DEVICES. The batch size by default is 18 but can be changed inside constants.py

Checkpoint download

Download our trained model checkpoint here: https://stanfordmedicine.box.com/s/c3stck6w6dol3h36grdc97xoydzxd7w9.

This model was first trained on ~187,000 MIMIC-CXR radiology reports labeled by the CheXpert labeler and then further trained on a separate set of 1000 radiologist-labeled reports from the MIMIC-CXR dataset, augmented with backtranslation. The MIMIC-CXR reports are deidentified and do not contain PHI. This model differs from the one in our paper, which was instead trained on radiology reports from the CheXpert dataset.

Usage

Label reports with CheXbert

Put all reports in a csv file under the column name "Report Impression". Let the path to this csv be {path to reports}. Download the PyTorch checkpoint and let the path to it be {path to checkpoint}. Let the path to your desired output folder by {path to output dir}.

python label.py -d={path to reports} -o={path to output dir} -c={path to checkpoint} 

The output file with labeled reports is {path to output dir}/labeled_reports.csv

Run the following for descriptions of all command line arguments:

python label.py -h

Ignore any error messages about the size of the report exceeding 512 tokens. All reports are automatically cut off at 512 tokens.

Train a model on labeled reports

Put all train/dev set reports in csv files under the column name "Report Impression". The labels for each of the 14 conditions should be in columns with the corresponding names, and the class labels should follow the convention described in this README.

Training is a two-step process. First, you must tokenize and save all the report impressions in the train and dev sets as lists:

python bert_tokenizer.py -d={path to train/dev reports csv} -o={path to output list}

After having saved the tokenized report impressions lists for the train and dev sets, you can run training as follows. You can modify the batch size or learning rate in constants.py

python run_bert.py --train_csv={path to train reports csv} --dev_csv={path to dev reports csv} --train_imp_list={path to train impressions list} --dev_imp_list={path to dev impressions list} --output_dir={path to checkpoint saving directory}

The above command will initialize BERT-base weights and then train the model. If you want to initialize the model with BlueBERT or BioBERT weights (or potentially any other pretrained weights) then you should download their checkpoints, convert them to pytorch using the HuggingFace transformers command line utility (https://huggingface.co/transformers/converting_tensorflow_models.html), and provide the path to the checkpoint folder in the PRETRAIN_PATH variable in constants.py. Then run the above command.

If you wish to train further from an existing CheXbert checkpoint you can run:

python run_bert.py --train_csv={path to train reports csv} --dev_csv={path to dev reports csv} --train_imp_list={path to train impressions list} --dev_imp_list={path to dev impressions list} --output_dir={path to checkpoint saving directory} --checkpoint={path to existing CheXbert checkpoint}

Label Convention

The labeler outputs the following numbers corresponding to classes. This convention is the same as that of the CheXpert labeler.

  • Blank: NaN
  • Positive: 1
  • Negative: 0
  • Uncertain: -1

Citation

If you use the CheXbert labeler in your work, please cite our paper:

@misc{smit2020chexbert,
	title={CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT},
	author={Akshay Smit and Saahil Jain and Pranav Rajpurkar and Anuj Pareek and Andrew Y. Ng and Matthew P. Lungren},
	year={2020},
	eprint={2004.09167},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].