All Projects → yuhaozhang → summarize-radiology-findings

yuhaozhang / summarize-radiology-findings

Licence: other
Code and pretrained model for paper "Learning to Summarize Radiology Findings"

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to summarize-radiology-findings

OAProgression
Multimodal Machine Learning-based Knee Osteoarthritis Progression Prediction from Plain Radiographs and Clinical Data
Stars: ✭ 58 (-7.94%)
Mutual labels:  medicine, radiology
wolfpacs
WolfPACS is an DICOM load balancer written in Erlang.
Stars: ✭ 1 (-98.41%)
Mutual labels:  radiology
gazeta
Gazeta: Dataset for automatic summarization of Russian news / Газета: набор данных для автоматического реферирования на русском языке
Stars: ✭ 25 (-60.32%)
Mutual labels:  summarization
query-focused-sum
Official code repository for "Exploring Neural Models for Query-Focused Summarization".
Stars: ✭ 17 (-73.02%)
Mutual labels:  summarization
PyRouge
A python library to compute rouge score for summarization
Stars: ✭ 54 (-14.29%)
Mutual labels:  summarization
product crawler
The Open Source Search Engine for Product Components
Stars: ✭ 23 (-63.49%)
Mutual labels:  medicine
Entity2Topic
[NAACL2018] Entity Commonsense Representation for Neural Abstractive Summarization
Stars: ✭ 20 (-68.25%)
Mutual labels:  summarization
FYP-AutoTextSum
Automatic Text Summarization with Machine Learning
Stars: ✭ 16 (-74.6%)
Mutual labels:  summarization
article-summary-deep-learning
📖 Using deep learning and scraping to analyze/summarize articles! Just drop in any URL!
Stars: ✭ 18 (-71.43%)
Mutual labels:  summarization
textdigester
TextDigester: document summarization java library
Stars: ✭ 23 (-63.49%)
Mutual labels:  summarization
2021-dialogue-summary-competition
[2021 훈민정음 한국어 음성•자연어 인공지능 경진대회] 대화요약 부문 알라꿍달라꿍 팀의 대화요약 학습 및 추론 코드를 공유하기 위한 레포입니다.
Stars: ✭ 86 (+36.51%)
Mutual labels:  summarization
video-summarizer
Summarizes videos into much shorter videos. Ideal for long lecture videos.
Stars: ✭ 92 (+46.03%)
Mutual labels:  summarization
kaggle brain-tumor-3D
Predict the status of a genetic biomarker important for brain cancer treatment
Stars: ✭ 20 (-68.25%)
Mutual labels:  radiology
Niffler
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Stars: ✭ 52 (-17.46%)
Mutual labels:  radiology
DenseNet-MURA-PyTorch
Implementation of DenseNet model on Standford's MURA dataset using PyTorch
Stars: ✭ 59 (-6.35%)
Mutual labels:  radiology
fawkes
🚀🚀 Fetch, parse, categorize, summarize user reviews 🚀🚀
Stars: ✭ 83 (+31.75%)
Mutual labels:  summarization
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+198.41%)
Mutual labels:  summarization
Copycat-abstractive-opinion-summarizer
ACL 2020 Unsupervised Opinion Summarization as Copycat-Review Generation
Stars: ✭ 76 (+20.63%)
Mutual labels:  summarization
sidenet
SideNet: Neural Extractive Summarization with Side Information
Stars: ✭ 52 (-17.46%)
Mutual labels:  summarization
technical-articles
Technical Pieces collected in practices
Stars: ✭ 35 (-44.44%)
Mutual labels:  summarization

Learning to Summarize Radiology Findings

This repo contains the PyTorch code and pretrained model for the paper Learning to Summarize Radiology Findings.

Requirements

  • Python 3 (tested on 3.6.5)
  • PyTorch (tested on 0.4.1)
  • tqdm
  • pythonrouge
  • unzip, wget (for downloading only)
  • nltk, ansicolors (for interactive demo only)

Overview

Due to privacy requirement, we are unfortunately not able to release the Stanford radiology report data used in the paper. However for completeness,

  1. We have included a summarization model pretrained on the Stanford data (see Train section), therefore you can either finetune the model with your own data, or run evaluation on other open dataset such as the Indiana Universty dataset (see Data section);

  2. We have included an interactive demo script, so that you can easily run the pretrained model on any radiograph report you have (see Interactive Script section).

Data

The dataset folder includes the following data:

  • stanford-sample: Sample data from the Stanford report repository (preprocessed);
  • iu-chest: Preprocessed test data from the Indiana University Chest X-ray dataset, originally downloaded from the NLM Open-i website. It contains 2691 unique reports, used as a test dataset in the paper.

All included data uses a jsonl format, with each line being a json string with three key-value pairs: background, findings, impression. For more details on this jsonl format please refer to utils/jsonl.py.

Training

Preparation

The summarization model is initialized with GloVe word vectors pretrained on 4.5 million Stanford radiology reports. We have made these pretrained word vectors available. First, you have to download these vectors by running:

chmod +x download.sh; ./download.sh

Then assuming you have your own radiology report corpus in the dataset/$REPORT directory, you can prepare vocabulary and initial word vectors with:

python prepare_vocab.py dataset/$REPORT dataset/vocab --glove_dir dataset/glove

This will write vocabulary and word vectors as a numpy matrix into the dir dataset/vocab.

Run training

To start training on your own data, run

python train.py --id $ID --data_dir dataset/$REPORT --background

This will train a summarization model with copy mechanism and background encoder and save everything into the saved_models/$ID directory. For other parameters please refer to train.py.

Pretrained model

We have included a model pretrained on 87k+ Stanford radiology reports in pretrained/model.pt.

Evaluation

To start evaluation, run

python eval.py saved_models/ --model best_model.pt --data_dir dataset/$REPORT --dataset test

This will look for dataset/$REPORT/test.jsonl file and run evaluation on it. Use --data_dir dataset/iu-chest if you want to run evaluation on the Indiana University data; add --out predictions.txt to write predicted summaries into a file; add --gold gold.txt to write gold summaries into a file.

Interactive Demo

You can run an interactive demo with the following command:

python run.py pretrained/model.pt

Then follow the prompt to input different sections. Here is an example report to start with:

Background: Three views of the abdomen: <date>. Comparison: <date>. Clinical history: a xx-year-old male status post hirschsprung’s disease repair.

Findings: The supine, left-sided decubitus and erect two views of the abdomen show increased dilatation of the small bowel since the prior exam on <date>. There are multiple air-fluid levels, suggesting bowel obstruction. No free intraperitoneal gas is present.

Citation

@inproceedings{zhang2018radsum,
 author = {Zhang, Yuhao and Ding, Daisy Yi and Qian, Tianpei and Manning, Christopher D. and Langlotz, Curtis P.},
 booktitle = {EMNLP 2018 Workshop on Health Text Mining and Information Analysis},
 title = {Learning to Summarize Radiology Findings},
 url = {https://nlp.stanford.edu/pubs/zhang2018radsum.pdf},
 year = {2018}
}

Licence

All work contained in this package is licensed under the Apache License, Version 2.0. See the included LICENSE file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].