All Projects → ptschandl → HAM10000_dataset

ptschandl / HAM10000_dataset

Licence: other
Tools for workup of the HAM10000 dataset

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
ImageJ Macro
14 projects

Projects that are alternatives of or similar to HAM10000 dataset

Health-Discernment-System
A menu based multiple chronic disease detection system which will detect if a person is suffering from a severe disease by taking an essential input image.
Stars: ✭ 25 (-57.63%)
Mutual labels:  skin-cancer
hermes
A library and microservice implementing the health and care terminology SNOMED CT with support for cross-maps, inference, fast full-text search, autocompletion, compositional grammar and the expression constraint language.
Stars: ✭ 131 (+122.03%)
Mutual labels:  diagnoses
Dermatron
Dermatology focused medical records software, augmented with computer vision and artificial intelligence [Meteor packaged with Electron]
Stars: ✭ 19 (-67.8%)
Mutual labels:  dermatology
Skin-cancer-recoginition
Recognizing and localizing melanoma from other skin disease
Stars: ✭ 28 (-52.54%)
Mutual labels:  skin-cancer
Skin-Cancer-Segmentation
Classification and Segmentation with Mask-RCNN of Skin Cancer using ISIC dataset
Stars: ✭ 61 (+3.39%)
Mutual labels:  skin-cancer
Skin Lesion Detection Deep Learning
Skin lesion detection from dermoscopic images using Convolutional Neural Networks
Stars: ✭ 48 (-18.64%)
Mutual labels:  skin-cancer
Skin Lesions Classification DCNNs
Transfer Learning with DCNNs (DenseNet, Inception V3, Inception-ResNet V2, VGG16) for skin lesions classification
Stars: ✭ 47 (-20.34%)
Mutual labels:  skin-cancer
amass-annotate-image
A tool to quickly Search and Annotate images online from multiple sources. Built with ❤️ using React
Stars: ✭ 36 (-38.98%)
Mutual labels:  annotate-images

HAM 10000 Dataset Tools

Creative Commons Lizenzvertrag

This repository gives access to the tools created and used for assembling the training dataset for the proposed HAM-10000 (Human Against Machine with 10000 training images) study, which extending part 3 of the ISIC 2018 challenge. The dataset itself is available for download at the Harvard dataverse or the ISIC-archive.


Extract

Following technique was used to leverage image data from PowerPoint slides, by extracting and ordering them with unique identifiers:


Filter

To more efficiently order large image sets of containing non-annotated overview (clinic), closeup (macro) and dermatoscopic (dsc) images, we fine-tuned a neural network to distinguish between those types automatically.

1. Annotation

  • filter/filter_annotation.py: An OpenCV based script to quickly annotate images within a subfolder into different image types. Results are stored in a CSV-file with the option to abort-and-resume annotation.

2. Training

Training was performed in Caffe / DIGITS abstracting away many training variables. We gained 1501 annotated images with the tool above and proceeded to training: GoogLeNet pretrained on ImageNet (taken from the NVIDIA DIGITS 5 Model Store) was fine-tuned on three classes for 20 epochs, landing at a final top-1 accuracy on the test-set of 98.68% (one dermatoscopic image classified as macro). The trained model files are provided in ./classify/caffe_model/*

3. Inference


Unify

Pathologic diagnoses in clinical practice are often non-standardized and verbose. The notebook below depicts our boilerplate used on different datasets to merge raw string data into a clean set of classes.

  • unify/unify_diagnoses.ipynb uses the pandas library to clean and unify diagnosis texts of dermatologic lesions into a confined set of diagnoses other or ambiguous classes.
    Note: The notebook contains only a subset of example terms for display purposes, as regular expressions are optimized to fit a given dataset. Therefore, most commonly the ones given will not be ready to be applied on a new set out of the box. Importantly, also the order of relabeling diagnoses matter, so we highly recommend manual checkup of relabeled diagnoses and stepwise iteration when applying to a new dataset.

Standardise

To normalise image format without squeezing, one Bash/ImageMagick command was applied to final images before data submission to the archive:

find . -type f \( -iname \*.jpg -o -iname \*.jpeg -o -iname \*.tiff -o -iname \*.tif \) -print0 | xargs -0 -n1 mogrify -strip -rotate "90<" -resize "600x450^" -gravity center -crop 600x450+0+0 -density 72 -units PixelsPerInch -format jpg -quality 100


Segment

  • segment/imagej_fiji_macros.ijm contains macros enabling an efficient workflow for loading, reviewing, correcting and creating binary segmentation masks. The masks needing review need to be created beforehand by other means. These macros were used, together with the neural network based on this paper, to create the segmentation masks for analyses in Tschandl et al. 2020.

Cite

If tools or data helped your research, please cite:

  • Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 (2018).
@article{Tschandl2018_HAM10000,
  author    = {Philipp Tschandl and
               Cliff Rosendahl and
               Harald Kittler},
  title     = {The {HAM10000} dataset, a large collection of multi-source dermatoscopic
               images of common pigmented skin lesions},
  journal   = {Sci. Data},
  volume    = {5},
  year      = {2018},
  pages     = {180161},
  doi       = {10.1038/sdata.2018.161}
}

If you used the segmentation macros or resulting segmentation masks from here, please cite:

@article{Tschandl2020_NatureMedicine,
  author = {Philipp Tschandl and Christoph Rinner and Zoe Apalla and Giuseppe Argenziano and Noel Codella and Allan Halpern and Monika Janda and Aimilios Lallas and Caterina Longo and Josep Malvehy and John Paoli and Susana Puig and Cliff Rosendahl and H. Peter Soyer and Iris Zalaudek and Harald Kittler},
  title = {Human{\textendash}computer collaboration for skin cancer recognition},
  journal = {Nature Medicine},
  volume = {26},
  number = {8},
  year = {2020},
  pages = {1229--1234},
  doi = {10.1038/s41591-020-0942-0},
  url = {https://doi.org/10.1038/s41591-020-0942-0}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].