Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → hms-dbmi → Hic Data Analysis Bootcamp

hms-dbmi / Hic Data Analysis Bootcamp

Licence: mit

Workshop on measuring, analyzing, and visualizing the 3D genome with Hi-C data.

Labels

jupyter-notebook data-visualization

Projects that are alternatives of or similar to Hic Data Analysis Bootcamp

Make a sweet giant triangle confusogram (GTC) plot

Stars: ✭ 13 (-87.25%)

Mutual labels: jupyter-notebook, data-visualization

Drugs Recommendation Using Reviews

Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.

Stars: ✭ 35 (-65.69%)

Mutual labels: jupyter-notebook, data-visualization

Deep learning projects

Stars: ✭ 28 (-72.55%)

Mutual labels: jupyter-notebook, data-visualization

Visualization Of Global Terrorism Database

📊 Visualization of GTD with py Plotly lib, including amazing graphs and animation 📼

Stars: ✭ 16 (-84.31%)

Mutual labels: jupyter-notebook, data-visualization

daru-view is for easy and interactive plotting in web application & IRuby notebook. daru-view is a plugin gem to the existing daru gem.

Stars: ✭ 65 (-36.27%)

Mutual labels: jupyter-notebook, data-visualization

Datadoubleconfirm

Simple datasets and notebooks for data visualization, statistical analysis and modelling - with write-ups here: http://projectosyo.wix.com/datadoubleconfirm.

Stars: ✭ 24 (-76.47%)

Mutual labels: jupyter-notebook, data-visualization

Satellite imagery analysis

Implementation of different techniques to find insights from the satellite data using Python.

Stars: ✭ 31 (-69.61%)

Mutual labels: jupyter-notebook, data-visualization

Cookbook 2nd Code

Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]

Stars: ✭ 541 (+430.39%)

Mutual labels: jupyter-notebook, data-visualization

Ds and ml projects

Data Science & Machine Learning projects and tutorials in python from beginner to advanced level.

Stars: ✭ 56 (-45.1%)

Mutual labels: jupyter-notebook, data-visualization

What Twitter reveals about the differences between cities and the monoculture of the Bay Area

Stars: ✭ 52 (-49.02%)

Mutual labels: jupyter-notebook, data-visualization

Visualizations for machine learning datasets

Stars: ✭ 6,744 (+6511.76%)

Mutual labels: jupyter-notebook, data-visualization

My Journey In The Data Science World

📢 Ready to learn or review your knowledge!

Stars: ✭ 1,175 (+1051.96%)

Mutual labels: jupyter-notebook, data-visualization

IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018

Stars: ✭ 704 (+590.2%)

Mutual labels: jupyter-notebook, data-visualization

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Stars: ✭ 864 (+747.06%)

Mutual labels: jupyter-notebook, data-visualization

Quantified Self Personal Data Aggregator and Data Analysis

Stars: ✭ 559 (+448.04%)

Mutual labels: jupyter-notebook, data-visualization

Kepler.gl is a powerful open source geospatial analysis tool for large-scale data sets.

Stars: ✭ 8,231 (+7969.61%)

Mutual labels: jupyter-notebook, data-visualization

D3 block magic for Jupyter notebook.

Stars: ✭ 428 (+319.61%)

Mutual labels: jupyter-notebook, data-visualization

Quiz & Assignment of Coursera

Stars: ✭ 454 (+345.1%)

Mutual labels: jupyter-notebook, data-visualization

Data Science Lunch And Learn

Resources for weekly Data Science Lunch & Learns

Stars: ✭ 49 (-51.96%)

Mutual labels: jupyter-notebook, data-visualization

Equalareacartogram

Converts a Shapefile, GeoJSON, or CSV to an equal area cartogram

Stars: ✭ 68 (-33.33%)

Mutual labels: jupyter-notebook, data-visualization

View All Similar Projects ➔

Hi-C Data Analysis Bootcamp

A tutorial on measuring, analyzing, and visualizing the 3D genome with Hi-C provided by Harvard, MIT, and UMassMed.

📢 Slides, code, and data is available for you to rerun the analyses!

Introduction

4D Nucleome Data Coordination and Integration Center and the Center for 3D Structure and Physics of the Genome hosted a Hi-C data analysis bootcamp at Harvard Medical School on May, 8th 2018. This repo contains the material for this bootcamp. Below, you can find more information on how to walk through the hands-on sessions offline.

Files in this repository

Tutorial Part 1 (Hi-C Protocol): Slides PDF | PPTX
Tutorial Part 2 (From fastqs to contact matrices): Slides HTML
Tutorial Part 3 (From contact matrices to biology): Slides PDF | PPTX
Tutorial Part 4 (Hi-C Data Visualization - HiGlass): Slides HTML
Tutorial Part 5 (Hi-C Data Visualization - HiPiler): Slides PDF | HTML

Presenters

Johan Gibcus, Research Instructor, Universy Massachusetts Medical School
Nezar Abdennur, PhD student, MIT
Soo Lee, Senior Bioinformatics Scientist, Harvard Medical School
Peter Kerpedjiev, Postdoctoral Research Fellow, Harvard Medical School
Fritz Lekschas PhD Student, Harvard University
Leonid Mirny Professor, MIT

Organizers

Burak Alver, Scientific Project Manager, Harvard Medical School
Nils Gehlenborg, Assistant Professor, Harvard Medical School
Peter Park, Professor, Harvard Medical School

Motivation and Objectives

Due in large part to the explanatory power of chromosome organization in gene regulation, its association with disease and disorder as well as the unanswered questions regarding the mechanisms behind its maintenance and function, the 3D structure and function of the genome are becoming increasingly target of scientific scrutiny. With efforts such as the 4D Nucleome Project and ENCODE 4 already beginning to generate large amounts of data, the ability to analyze and visualize it will be a valuable asset to any computational biologist tasked with interpretation of experimental results.

The objectives of this tutorial are

To introduce the theoretical concepts related to 3D genome data analysis
To familiarize participants with the data types, analysis pipeline, and common tools for analysis and visualization of 3D genome data
To provide a hands on experience in data analysis by walking through some common use cases of existing tools for data analysis and visualization.

After the workshop participants should be able to obtain, process, analyze, and visualize 3D genome data on their own as well as to understand some of the logic, motivation and pitfalls associated with common operations such as matrix balancing and multi-resolution visualization.

The subject matter and practical exercises presented in this tutorial will be accessible to a broad audience. Prior experience with next generation sequencing and the data it produces will be helpful for understanding the subsequent processing steps used to derive contact maps as well as some of the artifacts that can arise during data processing.

The material will be most useful to computational biologists and biologists working on genomics-related topics.

Student Requirements

A server will be set up for students with all the required software.
Windows users, please install Putty (for ssh).

Agenda

09:00 - 09:10 Introduction and Overview (Peter Park and Burak Alver, Harvard)

09:10 - 10:30 Hi-C Protocol (Johan Gibcus, UMass)

10:30 - 10:45 Break

10:45 - 12:15 From fastqs to contact matrices (Soohyun Lee, Harvard)

12:15 - 13:00 Lunch

13:00 - 14:00 From contact matrices to biology (Nezar Abdennur, MIT)

14:00 - 15:00 Hi-C Data Visualization - HiGlass (Peter Kerpedjiev, Harvard)

15:00 - 15:15 Break

15:15 - 16:00 Hi-C Data Visualization - HiPiler (Fritz Lekschas, Harvard)

16:00 - 17:00 Keynote Speaker - Leonid Mirny, MIT

Instructor Bios

Johan Gibcus

Johan Gibcus is a Research Instructor at the University of Massachussetts Medical School. He has not only used but also refined the Hi-C protocol to answer important biological questions about chromosome organization and replication. Web: http://www.dekkerlab.org/

Soo Lee

Soo Lee is a Senior Bioinformatics Scientist in the Department of Biomedical Informatics at Harvard Medical School. She is creating cloud-based pipelines for Hi-C and other genomic data and developing infrastructure for automation of such pipelines as part of the 4D Nucleome Data Coordination and Integration Center. Web: compbio.hms.harvard.edu/people/soohyun-lee

Nezar Abdennur

Nezar Abdennur is a PhD candidate in Computational and Systems Biology at MIT. His research focuses on the determinants of 3D genome organization and the development of tools for dealing with large Hi-C datasets. Twitter: @nv1ctus Web: nvictus.me

Peter Kerpedjiev

Peter Kerpedjiev is a postdoctoral researcher working on creating tools (such as HiGlass) for visualizing large genomic data sets. Twitter: @pkerpedjiev Web: emptypipes.org

Fritz Lekschas

Fritz is a PhD student working on biomedical information visualization with focus on large multiscale genomic data sets. He created tools like HiPiler or Scalable Insets Twitter: @flekschas Web: lekschas.de

Leonid Mirny

Leonid Mirny is a professor at MIT's Institute for Medical Engineering & Science. His lab studies the three dimensional organization of chromosomes using a combination of computational analysis and simulation. Twitter: @leonidmirny Web: mirnylab.mit.edu

Pointers for Offline Walk-through

During the bootcamp, users were given access to linux servers where

docker was installed,
conda was installed,
a conda enivronment was set up with a number of dependencies installed, including juypter notebook,
higlass-manager was installed,
and sample data was downloaded.

You can set up a similar environment and walk through the hands-on sessions of the bootcamp by following the instructions below. Allow 30G of storage for all files used in the tutorial.

From fastqs to contact matrices

Install docker, if you have not already done so. (Docker is a lighter alternative to virtual machines.)
Pull the docker image: docker pull duplexa/4dn-hic:v42. This docker image contains a number of software that have been pre-installed for HiC data processing.
Download the sample data for this session under your home directory to "~/data/" (or edit the commands on the slides accordingly, if you prefer a different directory).

mkdir data
cd data/
# input fastq files
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/input_R1.fastq.gz
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/input_R2.fastq.gz
gunzip input_R1.fastq.gz
gunzip input_R2.fastq.gz
# bwa genome index
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/hg38.bwaIndex.tgz
tar -xzf hg38.bwaIndex.tgz
rm hg38.bwaIndex.tgz
# chromsizes
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/hg38.mainonly.chrom.size
# prebaked output files
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/prebaked.tgz
tar -xzf prebaked.tgz
rm prebaked.tgz
# move back a directory
cd ..

Now, you should be able to follow slides 1 through 23 of the tutorial. When you are finished, exit the docker container with Ctrl-d before proceeding to the next part.

Working in a cluster without docker

If you are working in a High Performance Compute Cluster, you may not be allowed the install Docker. Instead, you can find the recipe for the docker image used above here. The exact configuration of the docker image can be seen in the dockerfile. You can get information on the bioinformatics software installed inside the docker image in the download.sh file.

From contact matrices to biology

Install conda, if you have not already done so. Conda is an open source package management tool that allows you to create separate environments.

Clone this repo and set up the environment.

git clone https://github.com/hms-dbmi/hic-data-analysis-bootcamp
cd hic-data-analysis-bootcamp
git pull
#you may need some of the following in case you have an issue creating an environment
#conda update --all -y
#sudo yum install -y hg
#conda install gcc
conda env create -n bootcamp -f environment.yml

Download the sample data for this session into the pre-existing "notebooks/data" directory (or edit the commands on the slides accordingly, if you prefer a different directory.

# from the hic-data-analysis-bootcamp directory we just made
cd notebooks/data
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/NIPBL.1000.mcool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/NIPBL.10000.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/NIPBL.20000.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/NIPBL.40000.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/NIPBL.100000.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/TAM.1000.mcool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/TAM.10000.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/TAM.20000.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/TAM.40000.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/TAM.100000.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/UNTR.1000.mcool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/UNTR.10000.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/UNTR.20000.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/UNTR.40000.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/UNTR.100000.cool

wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/CtcfCtrl.mm9__VS__InputCtrl.mm9.narrowPeak_with_motif.txt.gz
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/GSM1551552_HIC003_merged_nodups.txt.subset.gz
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/NIPBL_R1.nodups.pairs.gz
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/NIPBL_R1.nodups.pairs.gz.px2
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/Rao2014-GM12878-MboI-allreps-filtered.1000kb.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/Rao2014-GM12878-MboI-allreps-filtered.5kb.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/UNTR_R1.nodups.pairs.gz
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/UNTR_R1.nodups.pairs.gz.px2
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/b37.chrom.sizes.reduced
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/ctcf-sites.paired.300kb_flank10kb.tsv
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/hg19.chrom.sizes.reduced
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/mm9.chrom.sizes.reduced
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/mm9.fa
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/ranked_TSS.tsv

Go back to the "notebooks" directory and activate the environment to run the jupyter notebook.
```
cd ..
source activate bootcamp
jupyter notebook
```

If you're running it on your local machine, the notebook will open at http://localhost:8888. You may have to input the token displayed when starting up the Jupyter. Follow the steps in the notebooks starting with the top one, named "00_intro_cooler-cli".

HiGlass

Install and start docker on your machine.

docker pull gehlenborglab/higlass:v0.2.63  # higlass
pip install higlass-manage --upgrade

Download the sample data.

wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/Schwarzer-et-al-2017-NIPBL.multi.cool
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/Schwarzer-et-al-2017-RNAseq-minus.bw
wget https://s3.amazonaws.com/4dn-dcic-public/hic-data-analysis-bootcamp/Schwarzer-et-al-2017-UNTR.multi.cool

Now, you should be able to follow slides 24 through 59 of the tutorial.

Resources

Software

bwa and SAM spec
pairsamtools
pairix
cooler and docs
HiGlass, source code, and docs
HiPiler, source code, docs, project page

Package and environment management

Papers

Imakaev, Maxim, et al. "Iterative correction of Hi-C data reveals hallmarks of chromosome organization." Nature methods 9.10 (2012): 999-1003. doi:10.1038/nmeth.2148
Lajoie, Bryan R., Job Dekker, and Noam Kaplan. "The Hitchhiker’s guide to Hi-C analysis: practical guidelines." Methods 72 (2015): 65-75. doi:10.1016/j.ymeth.2014.10.031
Kerpedjiev, Peter, et al. "HiGlass: Web-based Visual Comparison And Exploration Of Genome Interaction Maps" bioRxiv. doi:10.1101/121889
Lekschas, Fritz et al. "HiPiler: Visual Exploration Of Large Genome Interaction Matrices With Interactive Small Multiples" IEEE Transactions on Visualization and Computer Graphics, 24(1), 522-531. doi:10.1109/TVCG.2017.2745978
Belaghzal H, et al. "Hi-C 2.0: An optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation." Methods. 2017 https://doi.org/10.1016/j.ymeth.2017.04.004
Golloshi R, et al. "Iteratively improving Hi-C experiments one step at a time." Methods. 2018 https://doi.org/10.1016/j.ymeth.2018.04.033
Oddes, Sivan, et al. "Three invariant Hi-C interaction patterns: applications to genome assembly". bioRxiv 306076. https://doi.org/10.1101/306076

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 102

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗