All Projects → gregwchase → nih-chest-xray

gregwchase / nih-chest-xray

Licence: MIT license
Identifying diseases in chest X-rays using convolutional neural networks

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to nih-chest-xray

AE-CNN
ICVGIP' 18 Oral Paper - Classification of thoracic diseases on ChestX-Ray14 dataset
Stars: ✭ 33 (-60.24%)
Mutual labels:  chest-xrays, chestxray14
Tengine-Convert-Tools
Tengine Convert Tool supports converting multi framworks' models into tmfile that suitable for Tengine-Lite AI framework.
Stars: ✭ 89 (+7.23%)
Mutual labels:  mxnet
gluon-faster-rcnn
Faster R-CNN implementation with MXNet Gluon API
Stars: ✭ 31 (-62.65%)
Mutual labels:  mxnet
ResidualAttentionNetwork
A Gluon implement of Residual Attention Network. Best acc on cifar10-97.78%.
Stars: ✭ 104 (+25.3%)
Mutual labels:  mxnet
djl
An Engine-Agnostic Deep Learning Framework in Java
Stars: ✭ 3,080 (+3610.84%)
Mutual labels:  mxnet
CycleGAN-gluon-mxnet
this repo attemps to reproduce Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks(CycleGAN) use gluon reimplementation
Stars: ✭ 31 (-62.65%)
Mutual labels:  mxnet
digital champions deeplearning r mxnet
Showcase for using R + MXNET along with AWS and bitfusion for deep learning.
Stars: ✭ 20 (-75.9%)
Mutual labels:  mxnet
MXNet-MobileNetV3
A Gluon implement of MobileNetV3
Stars: ✭ 28 (-66.27%)
Mutual labels:  mxnet
MXNet-GAN
MXNet Implementation of DCGAN, Conditional GAN, pix2pix
Stars: ✭ 23 (-72.29%)
Mutual labels:  mxnet
kaggle-dstl-satellite-imagery-feature-detection
6th place solution
Stars: ✭ 16 (-80.72%)
Mutual labels:  mxnet
forecastVeg
A Machine Learning Approach to Forecasting Remotely Sensed Vegetation Health in Python
Stars: ✭ 44 (-46.99%)
Mutual labels:  h2o
chest xray 14
Benchmarks on NIH Chest X-ray 14 dataset
Stars: ✭ 67 (-19.28%)
Mutual labels:  chest-xrays
FCOS GluonCV
FCOS: Fully Convolutional One-Stage Object Detection.
Stars: ✭ 24 (-71.08%)
Mutual labels:  mxnet
mxnet-retrain
Create mxnet finetuner (retrain) for mac/linux ,no need install docker and supports CPU, GPU(eGpu/cudnn).support the inception,resnet ,squeeznet,mobilenet...
Stars: ✭ 32 (-61.45%)
Mutual labels:  mxnet
mxnet-SSH
Reproduce SSH (Single Stage Headless Face Detector) with MXNet
Stars: ✭ 91 (+9.64%)
Mutual labels:  mxnet
XKT
Multiple Knowledge Tracing models implemented by mxnet
Stars: ✭ 14 (-83.13%)
Mutual labels:  mxnet
ImageRecognizer-iOS
[this repo is no longer maintained] Neural Network image classifier (inception-bn network architecture), developed via MxNet
Stars: ✭ 51 (-38.55%)
Mutual labels:  mxnet
Mxnet2Caffe-Tensor-RT-SEnet
Mxnet2Caffe_Tensor RT
Stars: ✭ 18 (-78.31%)
Mutual labels:  mxnet
ReadToMe
No description or website provided.
Stars: ✭ 51 (-38.55%)
Mutual labels:  mxnet
arcface retinaface mxnet2onnx
arcface and retinaface model convert mxnet to onnx.
Stars: ✭ 53 (-36.14%)
Mutual labels:  mxnet

X-Net: Classifying Chest X-Rays Using Deep Learning

Background

In October 2017, the National Institute of Health open sourced 112,000+ images of chest chest x-rays. Now known as ChestXray14, this dataset was opened in order to allow clinicians to make better diagnostic decisions for patients with various lung diseases.

Table of Contents

  1. Objective
  2. Dataset
  3. Exploratory Data Analysis
  4. Pipeline
  5. Preprocessing
  6. Model (Structured Data)
  7. Model (Convolutional Neural Network)
  8. Explanations
  9. References

Objective

  • Train a convolutional neural network to detect and classify diagnoses of patients.
  • Couple structured and unstructured datasets together into a dual classifier.

Dataset

The ChestXray14 dataset consists of both images and structured data.

The image dataset consists of 112,000+ images, which consist of 30,000 patients. Some patients have multiple scans, which will be taken into consideration. All images are originally 1024 x 1024 pixels.

Due to data sourcing & corruption issues, my image dataset consists of 10,000 of the original 112,000 images. All data is used for the structured model.

Additionally, structured data is also given to us for each image. This dataset includes features such as age, number of follow up visits, AP vs PA scan, and the patient gender.

Exploratory Data Analysis

When researching the labels, there are 709 original, unique categories present. On further examination, the labels are hierarchical. For example, some labels are only "Emphysema", while others are "Emphysema | Cardiac Issues".

The average age is 58 years old. However, about 400 patients are labeled as months, 1 of them is labeled in days.

Pipeline

Two pipelines were created for each dataset. Each script is labeled as either "Structured" or "CNN", which indicates which data pipeline the script is part of.

Description Script Model
EDA eda.py Structured
Resize Images resize_images.py CNN
Reconcile Labels reconcile_labels.py CNN
Convert Images to Arrays image_to_array.py CNN
CNN Model cnn.py CNN
Structured Data Model model.py Structured

Preprocessing

First, the labels were changed to reflect single categories, as opposed to the hierarchical categorical labels in the original data set. This reduces the number of categories from 709 to 15 categories. The label reduction takes its queue from the Stanford data scientists, who reduced the labels in the same way.

Irrelevant columns were also removed. These columns either had zero variance, or provided minimal information on the patient diagnosis.

Finally, anyone whose age was given in months (M) or days (D) was removed. The amount of data removed is minimal, and does not affect the analysis.

Model (Structured Data)

The structured data is trained using a gradient boosted classifier. The random forest classifier was also used. When comparing the results, both were nearly equal. The GBM classifier was used due to its speed over the random forest, and due to producing equal or better results to the random forest.

Results (Structured Data)

Measurement Score
Model H2O Gradient Boosting Estimator
Log Loss 1.670
MSE 0.510
RMSE 0.714
R^2 0.967
Mean Per-Class Error 0.933

Model (Convolutional Neural Network)

The CNN was trained using Keras, with the TensorFlow backend.

The model is similar to the VGG architectures; 2 to 3 convolution layers are used in each set of layers, followed by a pooling layer.

Dropout is used in the fully connected layers only, which slightly improved the results.

Results (Convolutional Neural Network)

Measurement Score
Accuracy 0.5456
Precision 0.306
Recall 0.553
F1 0.394

Explanations

Per the blog post from Luke Oakden-Rayner, there are multiple problems with this dataset. The most notable are the images (and structured data) being labeled incorrectly. He also notes the annotators did not look at the images.

This became evident when training both models. Despite regularization, and rectifying the class imbalances, both models learned to return meaningless predictions. Per the above statement, this can be attributed to the incorrect labeling of the images.

Due to these findings, per Mr. Oakden-Rayner, and my own analysis: "I believe the ChestXray14 dataset, as it exists now, is not fit for training medical AI systems to do diagnostic work."

This doesn't discount convolutional neural networks from being able to predict diseases, but this is dependent on the labels being correct and accurate. Once this becomes rectified, and the images are correctly labeled, further analysis can resume against the ChestXray14 dataset.

Tech Stack

tech_stack_banner

References

NIH Clinical Center provides one of the largest publicly available chest x-ray datasets to scientific community

Algorithm better at diagnosing pneumonia than radiologists

AutoML: Automatic Machine Learning

Stacked Ensembles

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].