All Projects → swethasubramanian → Lungcancerdetection

swethasubramanian / Lungcancerdetection

Licence: mit
Use CNN to detect nodules in LIDC dataset.

Projects that are alternatives of or similar to Lungcancerdetection

Mine Mutual Information Neural Estimation
A pytorch implementation of MINE(Mutual Information Neural Estimation)
Stars: ✭ 167 (-0.6%)
Mutual labels:  jupyter-notebook
Dcgan Autoencoder
Stars: ✭ 167 (-0.6%)
Mutual labels:  jupyter-notebook
Ant Learn Python
蚂蚁学Python,微信公众号的代码仓库
Stars: ✭ 168 (+0%)
Mutual labels:  jupyter-notebook
Python Crash Course
Python Crash Course
Stars: ✭ 167 (-0.6%)
Mutual labels:  jupyter-notebook
Sql magic
Magic functions for using Jupyter Notebook with Apache Spark and a variety of SQL databases.
Stars: ✭ 167 (-0.6%)
Mutual labels:  jupyter-notebook
Python For Developers
This book is geared toward those who already have programming knowledge. It covers topics that include: creation of user interfaces, computer graphics, internet applications, distributed systems, among other issues.
Stars: ✭ 167 (-0.6%)
Mutual labels:  jupyter-notebook
Bookrepo
Stars: ✭ 166 (-1.19%)
Mutual labels:  jupyter-notebook
Quickdraw
Stars: ✭ 168 (+0%)
Mutual labels:  jupyter-notebook
Stuff
Stuff I uploaded to share online or to access from a different machine
Stars: ✭ 167 (-0.6%)
Mutual labels:  jupyter-notebook
Zerocostdl4mic
ZeroCostDL4Mic: A Google Colab based no-cost toolbox to explore Deep-Learning in Microscopy
Stars: ✭ 168 (+0%)
Mutual labels:  jupyter-notebook
Rnn For Human Activity Recognition Using 2d Pose Input
Activity Recognition from 2D pose using an LSTM RNN
Stars: ✭ 165 (-1.79%)
Mutual labels:  jupyter-notebook
Pytorch Retraining
Transfer Learning Shootout for PyTorch's model zoo (torchvision)
Stars: ✭ 167 (-0.6%)
Mutual labels:  jupyter-notebook
Mediapy
This Python library makes it easy to display images and videos in a notebook.
Stars: ✭ 128 (-23.81%)
Mutual labels:  jupyter-notebook
Keras Seq 2 Seq Signal Prediction
An implementation of a sequence to sequence neural network using an encoder-decoder
Stars: ✭ 167 (-0.6%)
Mutual labels:  jupyter-notebook
Fsi Samples
A collection of open-source GPU accelerated Python tools and examples for quantitative analyst tasks and leverages RAPIDS AI project, Numba, cuDF, and Dask.
Stars: ✭ 168 (+0%)
Mutual labels:  jupyter-notebook
Cnn Exposed
🕵️‍♂️ Interpreting Convolutional Neural Network (CNN) Results.
Stars: ✭ 167 (-0.6%)
Mutual labels:  jupyter-notebook
Data Projects
Scripts and data for various Vox Media stories and news projects
Stars: ✭ 167 (-0.6%)
Mutual labels:  jupyter-notebook
Deeplearning.ai Andrewng
deeplearning.ai , By Andrew Ng, All slide and notebook + data + solutions and video link
Stars: ✭ 165 (-1.79%)
Mutual labels:  jupyter-notebook
Pytorch For Deep Learning And Computer Vision Course All Codes
PyTorch for Deep Learning and Computer Vision Course
Stars: ✭ 167 (-0.6%)
Mutual labels:  jupyter-notebook
Face mask detection
Face mask detection system using Deep learning.
Stars: ✭ 168 (+0%)
Mutual labels:  jupyter-notebook

LungCancerProject

Deep learning is a fast and evolving field that has a lot of implications on medical imaging field.

Currently medical images are interpreted by radiologists, physicians etc. But this interpretation gets very subjective. After years of looking at ultrasound images, my co-workers and I still get into arguments about whether we are actually seeing a tumor in a scan. Radiologists often have to look through large volumes of these images that can cause fatigue and lead to mistakes. So there is a need for automating this.

Machine learning algorithms such as support vector machines are often used to detect and classify tumors. But they are often limited by the assumptions we make when we define features. This results in reduced sensitivity. However, deep learning could be ideal solution because these algorithms are able to learn features from raw image data.

One challenge in implementing these algorithms is the scarcity of labeled medical image data. While this is a limitation for all applications of deep learning, it is more so for medical image data because of patient confidentiality concerns.

In this post you will learn how to build a convolutional neural network, train it, and have it detect lung nodules. I used the data from the Lung Image Database Consortium and Infectious Disease Research Institute [(LIDC/IDRI) data base] (https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI). As these images were huge (124 GB), I ended up using reformatted version available for LUNA16. This dataset consisted of 888 CT scans with annotations describing coordinates and ground truth labels. First step was to create a image database for training.

Creating an image database

The images were formatted as .mhd and .raw files. The header data is contained in .mhd files and multidimensional image data is stored in .raw files. I used SimpleITK library to read the .mhd files. Each CT scan has dimensions of 512 x 512 x n, where n is the number of axial scans. There are about 200 images in each CT scan.

There were a total of 551065 annotations. Of all the annotations provided, 1351 were labeled as nodules, rest were labeled negative. So there big class imbalance. The easy way to deal with it to under sample the majority class and augment the minority class through rotating images.

We could potentially train the CNN on all the pixels, but that would increase the computational cost and training time. So instead I just decided to crop the images around the coordinates provided in the annotations. The annotation were provided in Cartesian coordinates. So they had to be converted to voxel coordinates. Also the image intensity was defined in Hounsfield scale. So it had to be rescaled for image processing purposes.

The script below would generate 50 x 50 grayscale images for training, testing and validating a CNN.

<script src="https://gist.github.com/swethasubramanian/8483c5a21d0727e99976b0b9e2b60e68.js"></script>

While the script above under-sampled the negative class such that every 1 in 6 images had a nodule. The data set is still vastly imbalanced for training. I decided to augment my training set by rotating images. The script below does just that.

<script src="https://gist.github.com/swethasubramanian/72697b5cff4c5614c06460885dc7ae23.js"></script>

So for an original image, my script would create these two images:

original image 90 degree rotation 180 degree rotation

Augmentation resulted in a 80-20 class distribution, which was not entirely ideal. But I also did not want to augment the minority class too much because it might result in a minority class with little variation.

Building a CNN

Now we are ready to build a CNN. After dabbling a bit with tensorflow, I decided it was way too much work for something incredibly simple. I decided to use tflearn. Tflearn is a high-level API wrapper around tensorflow. It made coding lot more palatable. The approach I used was similar to this. I used a 3 convolutional layers in my architecture.

arch

My CNN model is defined in a class as shown in the script below.

<script src="https://gist.github.com/swethasubramanian/45be51b64d1595e78fb171c5dbb6cce6.js"></script>

I had a total of 6878 images in my training set.

Training the model

Because the data required to train a CNN is very large, it is often desirable to train the model in batches. Loading all the training data into memory is not always possible because you need enough memory to handle it and the features too. I was working out of a 2012 Macbook Pro. So I decided to load all the images into a hdfs dataset using h5py library. You can find the script I used to do that here.

Once I had the training data in a hdfs dataset, I trained the model using this script.

<script src="https://gist.github.com/swethasubramanian/dca76567afe1c175e016b2ce299cb7fb.js"></script>

The training took a couple of hours on my laptop. Like any engineer, I wanted to see what goes on under the hood. As the filters are of low resolution (5x5), it would be more useful to visualize features maps generated.

So if I pass through this image through the first convolutional layer (50 x 50 x 32), it generates a feature map that looks like this: conv_layer_0

The max pooling layer following the first layer downsampled the feature map by 2. So when the downsampled feature map is passed into the second convolutional layer of 64 5x5 filters, the resulting feature map is: conv_layer_1

The feature map generated by the third convolutional layer containing 64 3x3 filters is: conv_layer_2

Testing data

I tested my CNN model on 1623 images. I had an validation accuracy of 93 %. My model has a precision of 89.3 % and recall of 71.2 %. The model has a specificity of 98.2 %.

Here is the confusion matrix.

confusion_matrix

I looked deeper into the sort of predictions: False Negative Predictions: preds_fns False Positive Predictions: preds_fps True Negative Predictions: preds_tns True Positive Predictions: preds_tps

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].