All Projects → mahmoodlab → FAIRY

mahmoodlab / FAIRY

Licence: GPL-3.0 license
Fast and scalable search of whole-slide images via self-supervised deep learning - Nature Biomedical Engineering

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to FAIRY

TOAD
AI-based pathology predicts origins for cancers of unknown primary - Nature
Stars: ✭ 138 (+220.93%)
Mutual labels:  histology, bioimage-informatics, pathology, bioimage-analysis, histopathology, wsi-images, mahmoodlab
Patch-GCN
Context-Aware Survival Prediction using Patch-based Graph Convolutional Networks - MICCAI 2021
Stars: ✭ 63 (+46.51%)
Mutual labels:  pathology, histopathology, wsi-images, mahmoodlab
img classification deep learning
No description or website provided.
Stars: ✭ 19 (-55.81%)
Mutual labels:  image-retrieval, image-search-engine
monai-deploy
MONAI Deploy aims to become the de-facto standard for developing, packaging, testing, deploying and running medical AI applications in clinical production.
Stars: ✭ 56 (+30.23%)
Mutual labels:  pathology
LabelRelaxation-CVPR21
Official PyTorch Implementation of Embedding Transfer with Label Relaxation for Improved Metric Learning, CVPR 2021
Stars: ✭ 37 (-13.95%)
Mutual labels:  image-retrieval
dotfiles
Dotfiles for Neovim (0.7+), Fish shell, git, Kitty, tmux, and more.
Stars: ✭ 54 (+25.58%)
Mutual labels:  fish
fish-symnav
Symbolic link navigation for Fish shell
Stars: ✭ 11 (-74.42%)
Mutual labels:  fish
dotfiles
Poom's Neovim, Tmux, Fish and other configurations for macOS & Linux. Literally my entire world.
Stars: ✭ 36 (-16.28%)
Mutual labels:  fish
fish-poetry
🐟🐍 a fish plugin that automatically activates the poetry subshell
Stars: ✭ 25 (-41.86%)
Mutual labels:  fish
sublime-fish
A robust Sublime Text syntax package for fish
Stars: ✭ 32 (-25.58%)
Mutual labels:  fish
fish-color-scheme-switcher
A fish shell 🐟 plugin to switch color schemes 🌈
Stars: ✭ 48 (+11.63%)
Mutual labels:  fish
Image-Retrieval
Image retrieval program made in Tensorflow supporting VGG16, VGG19, InceptionV3 and InceptionV4 pretrained networks and own trained Convolutional autoencoder.
Stars: ✭ 56 (+30.23%)
Mutual labels:  image-retrieval
imageRetrieval
Image retrieval learning record
Stars: ✭ 31 (-27.91%)
Mutual labels:  image-retrieval
vframe
VFRAME: Visual Forensics and Metadata Extraction
Stars: ✭ 41 (-4.65%)
Mutual labels:  image-search-engine
GPQ
Generalized Product Quantization Network For Semi-supervised Image Retrieval - CVPR 2020
Stars: ✭ 60 (+39.53%)
Mutual labels:  image-retrieval
agnoster
Agnoster for Fish 🐠
Stars: ✭ 42 (-2.33%)
Mutual labels:  fish
cnn-for-image-retrieval
🌅The code of post "Image retrieval using MatconvNet and pre-trained imageNet"
Stars: ✭ 623 (+1348.84%)
Mutual labels:  image-retrieval
dotfiles
macOS / Linux / Codespaces dotfiles with 1-line setup script. Tested on Apple Silicon Macs. Supports both zsh and fish. Now managed with https://github.com/twpayne/chezmoi
Stars: ✭ 82 (+90.7%)
Mutual labels:  fish
Python-for-Bioimage-Analysis
This is the repository for a Python bioimage analysis course which establishes the fundamentals of image analysis in the context of biological imaging.
Stars: ✭ 91 (+111.63%)
Mutual labels:  bioimage-analysis
fish
Fish config with awesome flexible prompt, unicode symbols, better fzf integration and lot of handy functions.
Stars: ✭ 27 (-37.21%)
Mutual labels:  fish

SISH

Fast and scalable search of whole-slide images via self-supervised deep learning

Nature Biomedical Engineering

Read Link | Journal Link | Preprint | Cite

TL;DR: SISH is a histology whole slide image search pipeline that scales with O(1) and maintains constant search speed regardless of the size of the database. SISH uses self-supervised deep learning to encode meaningful representations from WSIs and a Van Emde Boas tree for fast search, followed by an uncertainty-based ranking algorithm to retrieve similar WSIs. We evaluated SISH on multiple tasks and datasets with over 22,000 patient cases spanning 56 disease subtypes. We additionally demonstrate that SISH can be used to assist with the diagnosis of rare cancer types where sufficient cases may not be available to train traditional deep models.

Teaser

Pre-requisites:

  • Linux (Tested on Ubuntu 18.04)
  • NVIDIA GPU (NVIDIA GeForce 2080 Ti)
  • Python (3.7.0), OpenCV (3.4.0), Openslide-python (1.1.1) and Pytorch (1.5.0) For more details, please refer to the installtion guide.

Usage

The steps below show how to build SISH pipeline in your own dataset. To reproduce the results in our paper, please refer to the reproducibility section.

Preprocessing

Step 1: Slide preparation

Make the ./DATA folder, download whole slide images there, and then organize them into the following structure. Note that we ignore slides without specific resolution.

DATA
└── WSI
    ├── SITE
    │   ├── DIAGNOSIS
    │   │   ├── RESOLUTION
    │   │   │   ├── slide_1
    │   │   │   ├── slide_2
    │   │   │   ├── slide_3

Step 2: Segmentation and Patching

We use the CLAM toolbox to segment and patch whole slide images. Simply run:

python create_patches_fp.py --source ./DATA/SITE/DIAGNOSIS/RESOLUTION/ --step_size STEP_SIZE --patch_size PATCH_SIZE --seg --patch --save_dir ./DATA/PATCHES/SITE/DIAGNOSIS/RESOLUTION

We set PATCH_SIZE and STEP_SIZE to 1024 for 20x slide and to 2048 for 40x slide. After segmentation and patching, the DATA directory will look like the following

DATA/
├── PATCHES
│   └── SITE
│       └── DIAGNOSIS
│           └── RESOLUTION
│               ├── masks
│               ├── patches
│               ├── process_list_autogen.csv
│               └── stitches
└── WSI

Step 3: Mosaic generation

The following script generates the mosaics for each whole slide image (Please download the checkpoint trash_lgrlbp.pkl from the link in the reproducibility section):

python extract_mosaic.py --slide_data_path ./DATA/WSI/SITE/DIAGNOSIS/RESOLUTOIN --slide_patch_path ./DATA/PATCHES/SITE/DIAGNOSIS/RESOLUTION/patches/ --save_path ./DATA/MOSAICS/SITE/DIAGNOSIS/RESOLUTION

Once mosaic generation finsihes, there are some rare cases contain artifacts (i.e., pure white patch) result from mosaic generation. We run the following script to remove the artifacts

python artifacts_removal.py --site_slide_path ./DATA/WSI/SITE/  --site_mosaic_path ./DATA/MOSAICS/SITE

The DATA directory should look like below. We only use the mosaics in the coord_clean folder for all experiments in the paper.

DATA/
├── MOSAICS
│   └── SITE
│       └── DIAGNOSIS
│           └── RESOLUTION
│               ├── coord
│               ├── coord_clean
├── PATCHES
└── WSI

Step 4 SISH database construction

To buid the database for each anatomic site, run build_index.py as below

python build_index.py --site SITE

After the script completes, it creates a database folder organized like

DATABASES/
└── SITE
    ├── index_meta
    │   └── meta.pkl
    └── index_tree
        └── veb.pkl

The index_meta/meta.pkl stores the meta data of each integer key in index_tree/veb.pkl. It also creates a folder LATENT that store the mosaic latent code from VQ-VAE and texture features from densenet which has the structure below

DATA/LATENT/
├── SITE
│   ├── DIAGNOSIS
│   │   ├── RESOLUTION
│   │   │   ├── densenet
│   │   │   │   ├── slide_1.pkl
│   │   │   │   └── slide_2.pkl
│   │   │   └── vqvae
│   │   │       ├── slide_1.h5
│   │   │       └── slide_2.h5

Step 5 Search the whole database

Run the script below to get each query's results in the database.

python main_search.py --site SITE --db_index_path ./DATABASES/SITE/index_tree/veb.pkl --index_meta_path ./DATABASES/SITE/index_meta/meta.pkl

It will store the results for each query and the time it takes in two separate folders, which are

QUERY_RESULTS/
└── SITE
    └── results.pkl
QUERY_SPEED/
├── SITE
│   └── speed_log.txt

Step 6 Evaluation

Run the eval.py to get the performance results which will direclty print on the screen when finish.

python eval.py --site SITE --result_path QUERY_RESULTS/SITE/results.pkl

Optional: SISH for patch retrieval

If you would like to use SISH for patch retrieval task, please organize your data into the structure below

./DATA_PATCH/
├── All
├── summary.csv

where all patches files are in the folder All and the summary.csv file stores patch name and label in the format below

patch1,y1
patch2,y2
patch3,y3
...

Once prepared, run the following:

Build database:

python build_index_patch.py --exp_name EXP_NAME --patch_label_file ./DATA_PATCH/summary.csv --patch_data_path ./DATA_PATCH/All

where the EXP_NAME is a customized name of this database. You can reproduce our kather100k results by setting EXP_NAME=kather100k. One thing to note is that you should implement your method start from line 236 to scale your patch to 1024x1024 if you use your own patch data.

Search:

python main_search_patch.py --exp_name EXP_NAME --patch_label_file ./DATA_PATCH/summary.csv --patch_data_path ./DATA_PATCH/All --db_index_path DATABASES_PATCH/EXP_NAME/index_tree/veb.pkl --index_meta_path DATABASES_PATCH/EXP_NAME/index_meta/meta.pkl

Evaluation:

python eval_patch.py --result_path QUERY_RESULTS/PATCH/EXP_NAME/results.pkl

Reproducibility

To reproduce the results in our paper, please download the checkpoints, preprocessed latent code and pre-build databases from the link. The preprocess latent codes and pre-build databases are results directly from Step 1-4 if you start everything from scratch. Once downloaded, unzip the DATABASES.zip and LATENT.ZIP under ./SISH and ./SISH/DATA/ respectively. The folder structures should like the ones in Step 4. Run the command in Step 5 and Step 6 to reproduce the results in each site.

To reproduce the anatomic site retrieval, run

python main_search.py --site organ --db_index_path DATABASES/organ/index_tree/veb.pkl --index_meta_path DATABASES/organ/index_meta/meta.pkl

and

python eval.py --site organ --result_path QUERY_RESULTS/organ/results.pkl

Note that the speed results could be different from paper if your CPU is not equivalent to ours (AMD368Ryzen Threadripper 3970X 32-Core Processor).

Funding

This work was funded by NIH NIGMS R35GM138216.

Reference

If you find our work useful in your research or if you use parts of this code please consider citing our paper:

Lu, M.Y., Williamson, D.F.K., Chen, T.Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng 5, 555–570 (2021). https://doi.org/10.1038/s41551-020-00682-w

@article{lu2021data,
  title={Data-efficient and weakly supervised computational pathology on whole-slide images},
  author={Lu, Ming Y and Williamson, Drew FK and Chen, Tiffany Y and Chen, Richard J and Barbieri, Matteo and Mahmood, Faisal},
  journal={Nature Biomedical Engineering},
  volume={5},
  number={6},
  pages={555--570},
  year={2021},
  publisher={Nature Publishing Group}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].