All Projects → microsoft → Elevation

microsoft / Elevation

Licence: mit
End-to-end guide design for CRISPR/Cas9 with machine learning

Projects that are alternatives of or similar to Elevation

Natural Language Object Retrieval
Code release for Hu et al. Natural Language Object Retrieval, in CVPR, 2016
Stars: ✭ 110 (-0.9%)
Mutual labels:  jupyter-notebook
Tsmoothie
A python library for time-series smoothing and outlier detection in a vectorized way.
Stars: ✭ 109 (-1.8%)
Mutual labels:  jupyter-notebook
Vision Ai Developer Kit
Vision AI Developer Kit Preview
Stars: ✭ 111 (+0%)
Mutual labels:  jupyter-notebook
Ax
Adaptive Experimentation Platform
Stars: ✭ 1,663 (+1398.2%)
Mutual labels:  jupyter-notebook
Experiments
Small experiments with attached code
Stars: ✭ 110 (-0.9%)
Mutual labels:  jupyter-notebook
Reservoir Engineering
Python worked examples and problems from Reservoir Engineering textbooks (Brian Towler SPE Textbook Vol. 8, etc.)
Stars: ✭ 110 (-0.9%)
Mutual labels:  jupyter-notebook
Introduction To Linear Programming
Introduction to Linear Programming with Python
Stars: ✭ 110 (-0.9%)
Mutual labels:  jupyter-notebook
Firstcoursenetworkscience
Tutorials, datasets, and other material associated with textbook "A First Course in Network Science" by Menczer, Fortunato & Davis
Stars: ✭ 111 (+0%)
Mutual labels:  jupyter-notebook
Python For Data Analytics
This course will teach you only the relevant topics in Python for starting your career in Data Analytics. There are also a bunch of tips and tricks throughout for resume writing, solving case studies, interviews etc. The idea is to help you land a job in analytics and not just teach you Python.
Stars: ✭ 111 (+0%)
Mutual labels:  jupyter-notebook
K210 tutorial
K210基础入门教程 edit by Kyle阿凯
Stars: ✭ 111 (+0%)
Mutual labels:  jupyter-notebook
Sphereface
Implementation for <SphereFace: Deep Hypersphere Embedding for Face Recognition> in CVPR'17.
Stars: ✭ 1,483 (+1236.04%)
Mutual labels:  jupyter-notebook
Adsabs Dev Api
Developer API service description and example client code
Stars: ✭ 110 (-0.9%)
Mutual labels:  jupyter-notebook
Tutorials2019
Stars: ✭ 111 (+0%)
Mutual labels:  jupyter-notebook
Deeplearning tutorials
The deeplearning algorithms implemented by tensorflow
Stars: ✭ 1,580 (+1323.42%)
Mutual labels:  jupyter-notebook
2048 Api
Educational API for developing ML (imitation learning or reinforcement learning) agents to play game 2048
Stars: ✭ 111 (+0%)
Mutual labels:  jupyter-notebook
Py Causal
Stars: ✭ 110 (-0.9%)
Mutual labels:  jupyter-notebook
May4 challenge exercises
Original versions of the exercises
Stars: ✭ 111 (+0%)
Mutual labels:  jupyter-notebook
Deep Learning Language Model
A Code Pattern focusing on how to train a machine learning language model while using Keras and Tensorflow
Stars: ✭ 111 (+0%)
Mutual labels:  jupyter-notebook
Deeplearning.ai Pytorch
PyTorch Implementations of Coursera's Deep Learning(deeplearning.ai) Specialization
Stars: ✭ 111 (+0%)
Mutual labels:  jupyter-notebook
Introduction To Linear Algebra 5th Edition Ee16a
Stars: ✭ 111 (+0%)
Mutual labels:  jupyter-notebook

Elevation

Off-target effects of the CRISPR-Cas9 system can lead to suboptimal gene editing outcomes and are a bottleneck in its development. Here, we introduce two interdependent machine learning models for the prediction of off-target effects of CRISPR-Cas9. The approach, which we named Elevation, scores individual guide–target pairs, and aggregates such scores into a single, overall summary guide score.

See our official project page for more detail.

Publications

Please cite this paper if using our predictive model:

Jennifer Listgarten*, Michael Weinstein*, Benjamin P. Kleinstiver, Alexander A. Sousa, J. Keith Joung, Jake Crawford, Kevin Gao, Luong Hoang, Melih Elibol, John G. Doench*, Nicolo Fusi*. Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs. Nature Biomedical Engineering, 2018.

(* = equal contributions/corresponding authors)

Dependencies

Software dependencies

  1. Install Anaconda >= 4.1.1: https://www.continuum.io/downloads

Download and process data dependencies

  1. First, download and process necessary public data files using the CRISPR/download_data.sh script (on Windows, you can run similar commands by hand to the wget commands in the script, or run the script using Bash on Windows/Cygwin/etc).

    After running the script, your directory structure should look like:

elevation/
    CRISPR/
        data/
            offtarget/
                Haeussler/
                CD33_data_postfilter.xlsx
                nbt.3117-S2.xlsx
                STable 18 CD33_OffTargetdata.xlsx*
                STable 19 FractionActive_dlfc_lookup.xlsx*
                Supplementary Table 10.xlsx
        gene_sequences/
            CD33_sequence.txt
        guideseq/
            guideseq.py
            guideseq_unique.txt
    cache/
    CHANGELOG.md
    elevation/
    ...
  1. One additional data file must be generated via a genome search, using our Elevation-search (aka "dsNickFury") software, which must be installed separately. The repository is located at https://github.com/michael-weinstein/dsNickFury3PlusOrchid.

    Note: If you are not planning to run a genome search for off-targets, you do not need to follow the instructions in the dsNickFury documentation for data dependencies. Here are the steps to follow:

    • Download and unzip the hg38 index (linked in the dsNickFury README) into dsNickFury/dsNickFury3PlusOrchid
    • Install anaconda2 into dsNickFury/dependencies
    • Use anaconda2 to create a Python 3 environment (e.g. dsNickFury/dependencies/anaconda2/bin/conda create -n dsNickFury python==3)
    • Edit the dsNickFury/dsNickFury3PlusOrchid/settings.py file so that network_root points to the directory containing dsNickFury, and anaconda_root points to the location of the anaconda2 install
    • Edit the CRISPR/guideseq/guideseq.py file so that DSNF_DIRECTORY points to the dsNickFury/dsNickFury3PlusOrchid directory

    At this point, you should be able to run CRISPR/guideseq/guideseq.py. (This will take some time to run; ~8 hours on a desktop)

    Once the script finishes, there should be a file called guideseq_unique_MM6_end0_lim999999999.hdf5 in the CRISPR/guideseq directory.

  2. Your directory structure should now look something like this:

elevation/
    CRISPR/
        data/
            offtarget/
                Haeussler/
                CD33_data_postfilter.xlsx
                nbt.3117-S2.xlsx
                STable 18 CD33_OffTargetdata.xlsx*
                STable 19 FractionActive_dlfc_lookup.xlsx*
                Supplementary Table 10.xlsx
        gene_sequences/
            CD33_sequence.txt
        guideseq/
            guideseq.py
            guideseq_unique.txt
            guideseq_unique_MM6_end0_lim999999999.hdf5
            ...
    cache/
    CHANGELOG.md
    elevation/
    ...

You can now install the elevation dependencies and run the software.

Install / Develop

  1. Create conda env for elevation: conda create -n elevation python=2.7
  2. Activate conda env:
    • (windows) activate elevation
    • (linux) source activate elevation
  3. Install Azimuth version 2.0.0: pip install git+https://github.com/MicrosoftResearch/Azimuth.git
  4. Overwrite some of the Azimuth dependencies, since Elevation uses different versions:
    • conda install pytables
    • conda install scikit-learn==0.18.1
    • pip install pandas==0.19.1 (installing these packages via conda/pip avoids recompiling them from source)
  5. Install/Develop elevation:
    • To install, python setup.py install
    • To develop, python setup.py develop

Test installation

Make sure everything is set up properly by running the following command from the root directory of the repository.

python -m pytest tests or nosetests tests

Use

Guide Sequence Prediction

import elevation.load_data
from elevation.cmds.predict import Predict

# load data
num_x = 100
roc_data, roc_Y_bin, roc_Y_vals = elevation.load_data.load_HauesslerFig2(1)
wildtype = list(roc_data['30mer'])[:num_x]
offtarget = list(roc_data['30mer_mut'])[:num_x]

# initialize predictor
p = Predict()

# run prediction
preds = p.execute(wildtype, offtarget)

# preds is a dictionary of the form {'linear-raw-stacker': [...], 'CFD': [...]}
for i in range(num_x):
    print(wildtype[i], offtarget[i], map(lambda kv: kv[0] + "=" + str(kv[1][i]), preds.iteritems()))

Aggregation Prediction

import numpy as np
import pickle
import elevation.load_data
from elevation.cmds.predict import Predict
from elevation import settings
from elevation import aggregation

# load data
num_x = 100
roc_data, roc_Y_bin, roc_Y_vals = elevation.load_data.load_HauesslerFig2()
wildtype = list(roc_data['30mer'])[:num_x]
offtarget = list(roc_data['30mer_mut'])[:num_x]

# initialize guide seq predictor
p = Predict()

# run prediction
preds = p.execute(wildtype, offtarget)

# load aggregation model
with open(settings.agg_model_file) as fh:
    final_model, other = pickle.load(fh)

# compute aggregated score
isgenic = np.zeros(num_x, dtype=np.bool)
result = aggregation.get_aggregated_score(
         preds['linear-raw-stacker'],
         preds['CFD'],
         isgenic,
         final_model)
print result

Recomputing Models

Models are persisted as pickle files and, under certain circumstances, may need to be recomputed. Elevation models depend on the CRISPR repository. To recompute models, run the following command.

elevation-fit --crispr_repo_dir /home/melih/dev/CRISPR

where /home/melih/dev/CRISPR corresponds to the directory that contains the CRISPR repository you'd like to use to recompute the models.

New Fixtures

After making changes to the models, to generate new fixtures (data used to test prediction consistency), run elevation-fixtures.

Run python -m pytest tests to make sure tests are still passing.

Settings

If you'd like to reconfigure the default location of CRISPR, the temp dir in which pickles are stored, etc., copy elevation/settings_template.py to elevation/settings.py and edit elevation/settings.py before installation. If elevation/settings.py does not exist at install time, then elevation/settings_template.py is used to create elevation/settings.py.

Contacting us

You can submit bug reports using the GitHub issue tracker. If you have any other questions, please contact us at [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].