yuhaozhang / summarize-radiology-findings

Licence: other

Code and pretrained model for paper "Learning to Summarize Radiology Findings"

Programming Languages

python

139335 projects - #7 most used programming language

shell

77523 projects

Projects that are alternatives of or similar to summarize-radiology-findings

OAProgression

Multimodal Machine Learning-based Knee Osteoarthritis Progression Prediction from Plain Radiographs and Clinical Data

Stars: ✭ 58 (-7.94%)

Mutual labels: medicine, radiology

wolfpacs

WolfPACS is an DICOM load balancer written in Erlang.

Stars: ✭ 1 (-98.41%)

Mutual labels: radiology

gazeta

Gazeta: Dataset for automatic summarization of Russian news / Газета: набор данных для автоматического реферирования на русском языке

Stars: ✭ 25 (-60.32%)

Mutual labels: summarization

query-focused-sum

Official code repository for "Exploring Neural Models for Query-Focused Summarization".

Stars: ✭ 17 (-73.02%)

Mutual labels: summarization

PyRouge

A python library to compute rouge score for summarization

Stars: ✭ 54 (-14.29%)

Mutual labels: summarization

product crawler

The Open Source Search Engine for Product Components

Stars: ✭ 23 (-63.49%)

Mutual labels: medicine

Entity2Topic

[NAACL2018] Entity Commonsense Representation for Neural Abstractive Summarization

Stars: ✭ 20 (-68.25%)

Mutual labels: summarization

FYP-AutoTextSum

Automatic Text Summarization with Machine Learning

Stars: ✭ 16 (-74.6%)

Mutual labels: summarization

article-summary-deep-learning

📖 Using deep learning and scraping to analyze/summarize articles! Just drop in any URL!

Stars: ✭ 18 (-71.43%)

Mutual labels: summarization

textdigester

TextDigester: document summarization java library

Stars: ✭ 23 (-63.49%)

Mutual labels: summarization

2021-dialogue-summary-competition

[2021 훈민정음 한국어 음성•자연어 인공지능 경진대회] 대화요약 부문 알라꿍달라꿍 팀의 대화요약 학습 및 추론 코드를 공유하기 위한 레포입니다.

Stars: ✭ 86 (+36.51%)

Mutual labels: summarization

video-summarizer

Summarizes videos into much shorter videos. Ideal for long lecture videos.

Stars: ✭ 92 (+46.03%)

Mutual labels: summarization

kaggle brain-tumor-3D

Predict the status of a genetic biomarker important for brain cancer treatment

Stars: ✭ 20 (-68.25%)

Mutual labels: radiology

Niffler

Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.

Stars: ✭ 52 (-17.46%)

Mutual labels: radiology

DenseNet-MURA-PyTorch

Implementation of DenseNet model on Standford's MURA dataset using PyTorch

Stars: ✭ 59 (-6.35%)

Mutual labels: radiology

fawkes

🚀🚀 Fetch, parse, categorize, summarize user reviews 🚀🚀

Stars: ✭ 83 (+31.75%)

Mutual labels: summarization

text2text

Text2Text: Cross-lingual natural language processing and generation toolkit

Stars: ✭ 188 (+198.41%)

Mutual labels: summarization

Copycat-abstractive-opinion-summarizer

ACL 2020 Unsupervised Opinion Summarization as Copycat-Review Generation

Stars: ✭ 76 (+20.63%)

Mutual labels: summarization

sidenet

SideNet: Neural Extractive Summarization with Side Information

Stars: ✭ 52 (-17.46%)

Mutual labels: summarization

technical-articles

Technical Pieces collected in practices

Stars: ✭ 35 (-44.44%)

Mutual labels: summarization

View All Similar Projects ➔

Learning to Summarize Radiology Findings

This repo contains the PyTorch code and pretrained model for the paper Learning to Summarize Radiology Findings.

Requirements

Python 3 (tested on 3.6.5)
PyTorch (tested on 0.4.1)
tqdm
pythonrouge
unzip, wget (for downloading only)
nltk, ansicolors (for interactive demo only)

Overview

Due to privacy requirement, we are unfortunately not able to release the Stanford radiology report data used in the paper. However for completeness,

We have included a summarization model pretrained on the Stanford data (see Train section), therefore you can either finetune the model with your own data, or run evaluation on other open dataset such as the Indiana Universty dataset (see Data section);
We have included an interactive demo script, so that you can easily run the pretrained model on any radiograph report you have (see Interactive Script section).

Data

The dataset folder includes the following data:

stanford-sample: Sample data from the Stanford report repository (preprocessed);
iu-chest: Preprocessed test data from the Indiana University Chest X-ray dataset, originally downloaded from the NLM Open-i website. It contains 2691 unique reports, used as a test dataset in the paper.

All included data uses a jsonl format, with each line being a json string with three key-value pairs: background, findings, impression. For more details on this jsonl format please refer to utils/jsonl.py.

Training

Preparation

The summarization model is initialized with GloVe word vectors pretrained on 4.5 million Stanford radiology reports. We have made these pretrained word vectors available. First, you have to download these vectors by running:

chmod +x download.sh; ./download.sh

Then assuming you have your own radiology report corpus in the dataset/$REPORT directory, you can prepare vocabulary and initial word vectors with:

python prepare_vocab.py dataset/$REPORT dataset/vocab --glove_dir dataset/glove

This will write vocabulary and word vectors as a numpy matrix into the dir dataset/vocab.

Run training

To start training on your own data, run

python train.py --id $ID --data_dir dataset/$REPORT --background

This will train a summarization model with copy mechanism and background encoder and save everything into the saved_models/$ID directory. For other parameters please refer to train.py.

Pretrained model

We have included a model pretrained on 87k+ Stanford radiology reports in pretrained/model.pt.

Evaluation

To start evaluation, run

python eval.py saved_models/ --model best_model.pt --data_dir dataset/$REPORT --dataset test

This will look for dataset/$REPORT/test.jsonl file and run evaluation on it. Use --data_dir dataset/iu-chest if you want to run evaluation on the Indiana University data; add --out predictions.txt to write predicted summaries into a file; add --gold gold.txt to write gold summaries into a file.

Interactive Demo

You can run an interactive demo with the following command:

python run.py pretrained/model.pt

Then follow the prompt to input different sections. Here is an example report to start with:

Background: Three views of the abdomen: <date>. Comparison: <date>. Clinical history: a xx-year-old male status post hirschsprung’s disease repair.

Findings: The supine, left-sided decubitus and erect two views of the abdomen show increased dilatation of the small bowel since the prior exam on <date>. There are multiple air-fluid levels, suggesting bowel obstruction. No free intraperitoneal gas is present.

Citation

@inproceedings{zhang2018radsum,
 author = {Zhang, Yuhao and Ding, Daisy Yi and Qian, Tianpei and Manning, Christopher D. and Langlotz, Curtis P.},
 booktitle = {EMNLP 2018 Workshop on Health Text Mining and Information Analysis},
 title = {Learning to Summarize Radiology Findings},
 url = {https://nlp.stanford.edu/pubs/zhang2018radsum.pdf},
 year = {2018}
}

Licence

All work contained in this package is licensed under the Apache License, Version 2.0. See the included LICENSE file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

yuhaozhang / summarize-radiology-findings

Programming Languages

Labels

Projects that are alternatives of or similar to summarize-radiology-findings

Learning to Summarize Radiology Findings

Requirements

Overview

Data

Training

Preparation

Run training

Pretrained model

Evaluation

Interactive Demo

Citation

Licence