All Projects → jisungk → Riddle

jisungk / Riddle

Licence: apache-2.0
Race and ethnicity Imputation from Disease history with Deep LEarning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Riddle

flexidot
Highly customizable, ambiguity-aware dotplots for visual sequence analyses
Stars: ✭ 73 (-19.78%)
Mutual labels:  bioinformatics, biology
Tdc
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics
Stars: ✭ 291 (+219.78%)
Mutual labels:  bioinformatics, biology
lexicon-mono-seq
DOM Text Based Multiple Sequence Alignment Library
Stars: ✭ 15 (-83.52%)
Mutual labels:  bioinformatics, biology
Deep Rules
Ten Quick Tips for Deep Learning in Biology
Stars: ✭ 179 (+96.7%)
Mutual labels:  bioinformatics, biology
Metasra Pipeline
MetaSRA: normalized sample-specific metadata for the Sequence Read Archive
Stars: ✭ 33 (-63.74%)
Mutual labels:  bioinformatics, biology
Intermine
A powerful open source data warehouse system
Stars: ✭ 195 (+114.29%)
Mutual labels:  bioinformatics, biology
Pygeno
Personalized Genomics and Proteomics. Main diet: Ensembl, side dishes: SNPs
Stars: ✭ 261 (+186.81%)
Mutual labels:  bioinformatics, biology
Bio.jl
[DEPRECATED] Bioinformatics and Computational Biology Infrastructure for Julia
Stars: ✭ 257 (+182.42%)
Mutual labels:  bioinformatics, biology
Ncbi Genome Download
Scripts to download genomes from the NCBI FTP servers
Stars: ✭ 494 (+442.86%)
Mutual labels:  bioinformatics, biology
Jbrowse
A modern genome browser built with JavaScript and HTML5.
Stars: ✭ 393 (+331.87%)
Mutual labels:  bioinformatics, biology
Awesome Biology
Curated (meta)list of resources for Biology.
Stars: ✭ 174 (+91.21%)
Mutual labels:  bioinformatics, biology
Globalbioticinteractions
Global Biotic Interactions provides access to existing species interaction datasets
Stars: ✭ 71 (-21.98%)
Mutual labels:  bioinformatics, biology
Indra
INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
Stars: ✭ 105 (+15.38%)
Mutual labels:  bioinformatics, biology
full spectrum bioinformatics
An open-access bioinformatics text
Stars: ✭ 26 (-71.43%)
Mutual labels:  bioinformatics, biology
Jvarkit
Java utilities for Bioinformatics
Stars: ✭ 313 (+243.96%)
Mutual labels:  bioinformatics, biology
Python biologist
Python Programming for Biologists
Stars: ✭ 55 (-39.56%)
Mutual labels:  bioinformatics, biology
Biosequences.jl
Biological sequences for the julia language
Stars: ✭ 77 (-15.38%)
Mutual labels:  bioinformatics, biology
Bioinformatics Workbook
Bioinformatics Workbook repository
Stars: ✭ 85 (-6.59%)
Mutual labels:  bioinformatics
Decontam
Simple statistical identification and removal of contaminants in marker-gene and metagenomics sequencing data
Stars: ✭ 86 (-5.49%)
Mutual labels:  bioinformatics
Truvari
Structural variant toolkit for VCFs
Stars: ✭ 85 (-6.59%)
Mutual labels:  bioinformatics

RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning (RIDDLE) Build Status License

RIDDLE (Race and ethnicity Imputation from Disease history with Deep LEarning) is an open-source Python2 library for using deep learning to impute race and ethnicity information in anonymized electronic medical records (EMRs). RIDDLE provides the ability to (1) build models for estimating race and ethnicity from clinical features, and (2) interpret trained models to describe how specific features contribute to predictions. The RIDDLE library implements the methods introduced in "RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning" (PLOS Computational Biology, 2018).

Compared to alternative methods (e.g., scikit-learn/Python, glm/R), RIDDLE is designed to handle large and high-dimensional datasets in a performant fashion. RIDDLE trains models efficiently by running on a parallelized TensorFlow/Theano backend, and avoids memory overflow by preprocessing data in conjunction with batch-wise training.

RIDDLE uses Keras to specify and train the underlying deep neural networks, and DeepLIFT to compute feature-to-class contribution scores. The current RIDDLE Python module works with both TensorFlow and Theno as the backend to Keras. The default architecture is a deep multi-layer perceptron (deep MLP) that takes binary-encoded features and targets. However, you can specify any neural network architecture (e.g., LSTM, CNN) and data format by writing your own model_module files!

Documentation

Please visit riddle.ai.

Dependencies

Python Libraries:

  • Keras (keras)
  • DeepLIFT (deeplift, available on GitHub)
  • TensorFlow (tensorflow) or Theano (theano)
  • scikit-learn (sklearn)
  • NumPy (numpy)
  • SciPy (scipy)
  • Matplotlib (matplotlib)
  • h5py (h5py)

General:

  • HDF5

Unit testing

Execute the following command in the outer repository folder (not riddle/riddle):

% PYTHONPATH=. pytest

FAQ

What's the easiest way to install RIDDLE?

You can clone the GitHub repo and go from there:

% git clone --recursive git://github.com/jisungk/riddle.git
% cd riddle
% pip install -r requirements.txt

How can I run the RIDDLE pipeline?

Execute the following scripts.

% python parameter_search.py  # run parameter tuning
% python riddle.py            # train and evaluate the model
% python interpret_riddle.py  # interpret the traiend model

What is the default format for data files?

Please refer to the example data file dummy.txt and the accompanying README in the _data directory.

Authors

Ji-Sung Kim
Princeton University
hello (at) jisungkim.com (technical inquiries)

Xin Gao, Associate Professor
King Abdullah University of Science and Technology

Andrey Rzhetsky, Edna K. Papazian Professor
University of Chicago
andrey.rzhetsky (at) uchicago.edu (research inquiries)

License & Attribution

All media (including but not limited to designs, images and logos) are copyrighted by Ji-Sung Kim (2017).

Project code (explicitly excluding media) is licensed under the Apache License 2.0. If you would like use or modify this project or any code presented here, please include the notice and license files, and cite:

@article{10.1371/journal.pcbi.1006106,
    author = {Kim, Ji-Sung AND Gao, Xin AND Rzhetsky, Andrey},
    journal = {PLOS Computational Biology},
    publisher = {Public Library of Science},
    title = {RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning},
    year = {2018},
    month = {04},
    volume = {14},
    url = {https://doi.org/10.1371/journal.pcbi.1006106},
    pages = {1-15},
    number = {4},
    doi = {10.1371/journal.pcbi.1006106}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].