All Projects → aidenlab → Juicer

aidenlab / Juicer

Licence: mit
A One-Click System for Analyzing Loop-Resolution Hi-C Experiments

Programming Languages

shell
77523 projects

Labels

Projects that are alternatives of or similar to Juicer

Artemis
Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation
Stars: ✭ 135 (-33.5%)
Mutual labels:  genomics
Goleft
goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary
Stars: ✭ 175 (-13.79%)
Mutual labels:  genomics
Ribbon
A genome browser that shows long reads and complex variants better
Stars: ✭ 184 (-9.36%)
Mutual labels:  genomics
Awesome Bioinformatics Benchmarks
A curated list of bioinformatics bench-marking papers and resources.
Stars: ✭ 142 (-30.05%)
Mutual labels:  genomics
Viral Ngs
Viral genomics analysis pipelines
Stars: ✭ 150 (-26.11%)
Mutual labels:  genomics
Wgsim
Reads simulator
Stars: ✭ 178 (-12.32%)
Mutual labels:  genomics
Octopus
Bayesian haplotype-based mutation calling
Stars: ✭ 131 (-35.47%)
Mutual labels:  genomics
Intermine
A powerful open source data warehouse system
Stars: ✭ 195 (-3.94%)
Mutual labels:  genomics
Glow
An open-source toolkit for large-scale genomic analysis
Stars: ✭ 159 (-21.67%)
Mutual labels:  genomics
Ideogram
Chromosome visualization for the web
Stars: ✭ 181 (-10.84%)
Mutual labels:  genomics
Biomartr
Genomic Data Retrieval with R
Stars: ✭ 144 (-29.06%)
Mutual labels:  genomics
Vcfr
Tools to work with variant call format files
Stars: ✭ 149 (-26.6%)
Mutual labels:  genomics
Deep Rules
Ten Quick Tips for Deep Learning in Biology
Stars: ✭ 179 (-11.82%)
Mutual labels:  genomics
Hgvs
Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
Stars: ✭ 138 (-32.02%)
Mutual labels:  genomics
Genometools
GenomeTools genome analysis system.
Stars: ✭ 186 (-8.37%)
Mutual labels:  genomics
Hifiasm
Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
Stars: ✭ 134 (-33.99%)
Mutual labels:  genomics
Roary
Rapid large-scale prokaryote pan genome analysis
Stars: ✭ 176 (-13.3%)
Mutual labels:  genomics
Sequenceserver
Intuitive local web frontend for the BLAST bioinformatics tool
Stars: ✭ 198 (-2.46%)
Mutual labels:  genomics
Deepvariant
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
Stars: ✭ 2,404 (+1084.24%)
Mutual labels:  genomics
Janggu
Deep learning infrastructure for bioinformatics
Stars: ✭ 174 (-14.29%)
Mutual labels:  genomics

Juicer

Juicer is a platform for analyzing kilobase resolution Hi-C data. In this distribution, we include the pipeline for generating Hi-C maps from fastq raw data files and command line tools for feature annotation on the Hi-C maps.

Juicer is currently in its beta release, Juicer version 1.6. For general questions, please use the Google Group.

If you have further difficulties using Juicer, please do not hesitate to contact us ([email protected])

If you use Juicer in your research, please cite: Neva C. Durand, Muhammad S. Shamim, Ido Machol, Suhas S. P. Rao, Miriam H. Huntley, Eric S. Lander, and Erez Lieberman Aiden. "Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments." Cell Systems 3(1), 2016.

Documentation

Please see the wiki for extensive documentation.

Questions?

For FAQs, or for asking new questions, please see our forum: aidenlab.org/forum.html.


Distribution

In this repository, we include the scripts for running Juicer on AWS, LSF, Univa Grid Engine, SLURM, and a single CPU

/AWS - scripts for running pipeline and postprocessing on AWS

/UGER - scripts for running pipeline and postprocessing on UGER (Univa)

/SLURM - scripts for running pipeline and postprocessing on SLURM

/LSF - scripts for running pipeline and postprocessing on LSF BETA

/CPU - scripts for running pipeline and postprocessing on a single CPU BETA

/misc - miscellaneous helpful scripts


Hardware and Software Requirements

Juicer is a pipeline optimized for parallel computation on a cluster. Juicer consists of two parts: the pipeline that creates Hi-C files from raw data, and the post-processing command line tools.

Cluster requirements:

Juicer requires the use of a cluster, with ideally >= 4 cores (min 1 core) and >= 64 GB RAM (min 16 GB RAM)

Juicer currently works with the following resource management software:

Juicer tools requirements

The minimum software requirement to run Juicer is a working Java installation (version >= 1.7) on Windows, Linux, and Mac OSX. We recommend using the latest Java version available, but please do not use the Java Beta Version. Minimum system requirements for running Java can be found at https://java.com/en/download/help/sysreq.xml

To download and install the latest Java Runtime Environment (JRE), please go to https://www.java.com/download

GNU CoreUtils

The latest version of GNU coreutils can be downloaded from https://www.gnu.org/software/coreutils/manual/

Burrows-Wheeler Aligner (BWA)

The latest version of BWA should be installed from http://bio-bwa.sourceforge.net/

CUDA (for HiCCUPS peak calling)

You must have an NVIDIA GPU to install CUDA.

Instructions for installing the latest version of CUDA can be found on the NVIDIA Developer site.

The native libraries included with Juicer are compiled for CUDA 7 or CUDA 7.5. See the download page for Juicer Tools.

Other versions of CUDA can be used, but you will need to download the respective native libraries from JCuda.

For best performance, use a dedicated GPU. You may also be able to obtain access to GPU clusters through Amazon Web Services or a local research institution.

Building new jars

See the Juicebox documentation at https://github.com/theaidenlab/Juicebox for details on building new jars of the juicer_tools.


Quick Start

Run the Juicer pipeline on your cluster of choice with "juicer.sh [options]"

Usage: juicer.sh [-g genomeID] [-d topDir] [-q queue] [-l long queue] [-s site]
                 [-a about] [-R end] [-S stage] [-p chrom.sizes path]
                 [-y restriction site file] [-z reference genome file]
                 [-C chunk size] [-D Juicer scripts directory]
                 [-Q queue time limit] [-L long queue time limit] [-e] [-h] [-x]
* [genomeID] must be defined in the script, e.g. "hg19" or "mm10" (default
  "hg19"); alternatively, it can be defined using the -z command
* [topDir] is the top level directory (default
  "/Users/nchernia/Downloads/neva-muck/UGER")
     [topDir]/fastq must contain the fastq files
     [topDir]/splits will be created to contain the temporary split files
     [topDir]/aligned will be created for the final alignment
* [queue] is the queue for running alignments (default "short")
* [long queue] is the queue for running longer jobs such as the hic file
  creation (default "long")
* [site] must be defined in the script, e.g.  "HindIII" or "MboI"
  (default "none")
* [about]: enter description of experiment, enclosed in single quotes
* [stage]: must be one of "chimeric", "merge", "dedup", "final", "postproc", or "early".
    -Use "chimeric" when alignments are done but chimeric handling has not finished
    -Use "merge" when alignment has finished but the merged_sort file has not
     yet been created.
    -Use "dedup" when the files have been merged into merged_sort but
     merged_nodups has not yet been created.
    -Use "final" when the reads have been deduped into merged_nodups but the
     final stats and hic files have not yet been created.
    -Use "postproc" when the hic files have been created and only
     postprocessing feature annotation remains to be completed.
    -Use "early" for an early exit, before the final creation of the stats and
     hic files
* [chrom.sizes path]: enter path for chrom.sizes file
* [restriction site file]: enter path for restriction site file (locations of
  restriction sites in genome; can be generated with the script
  (misc/generate_site_positions.py) )
* [reference genome file]: enter path for reference sequence file, BWA index
  files must be in same directory
* [chunk size]: number of lines in split files, must be multiple of 4
  (default 90000000, which equals 22.5 million reads)
* [Juicer scripts directory]: set the Juicer directory,
  which should have scripts/ references/ and restriction_sites/ underneath it
  (default /broad/aidenlab)
* [queue time limit]: time limit for queue, i.e. -W 12:00 is 12 hours
  (default 1200)
* [long queue time limit]: time limit for long queue, i.e. -W 168:00 is one week
  (default 3600)
* -f: include fragment-delimited maps from hic file creation
* -e: early exit
* -h: print this help and exit

Juicer Usage

  • Running Juicer with no arguments will run it with genomeID hg19 and site MboI
  • Providing a genome ID: if not defined in the script, you can either directly modify the script or provide the script with the files needed. You would provide the script with the files needed via "-z reference_sequence_path" (needs to have the BWA index files in same directory), "-p chrom_sizes_path" (these are the chromosomes you want included in .hic file), and "-s site_file" (this is the listing of all the restriction site locations, one line per chromosome). Note that ligation junction won't be defined in this case. The script (misc/generate_site_positions.py) can help you generate the file
  • Providing a restriction enzyme: if not defined in the script, you can either directly modify the script or provide the files needed via the "-s site_file" flag, as above. Alternatively, if you don't want to do any fragment-level analysis (as with a DNAse experiment), you should assign the site "none", as in juicer.sh -s none
  • Directory structure: Juicer expects the fastq files to be stored in a directory underneath the top-level directory. E.g. HIC001/fastq. By default, the top-level directory is the directory where you are when you launch Juicer; you can change this via the -d flag. Fastqs can be zipped. [topDir]/splits will be created to contain the temporary split files and should be deleted once your run is completed. [topDir]/aligned will be created for the final files, including the hic files, the statistics, the valid pairs (merged_nodups), the collisions, and the feature annotations.
  • Queues are complicated and it's likely that you'll have to modify the script for your system, though we did our best to avoid this. By default there's a short queue and a long queue. We also allow you to pass in wait times for those queues; this is currently ignored by the UGER and SLURM versions. The short queue should be able to complete alignment of one split file. The long queue is for jobs that we expect to take a while, like writing out the merged_sort file
  • Chunk size is intimitely associated with your queues; a smaller chunk size means more alignment jobs that complete in a faster time. If you have a hard limit on the number of jobs, you don't want too small of a chunk size. If your short queue has a very limited runtime ceiling, you don't want too big of a chunk size. Run time for alignment will also depend on the particulars of your cluster. We launch ~5 jobs per chunk. Chunk size must be a multiple of 4.
  • Relaunch via the same script. Type juicer.sh [options] -S stage where "stage" is one of merge, dedup, final, postproc, or early. "merge" is for when alignment has finished but merged_sort hasn't been created; "dedup" is for when merged_sort is there but not merged_nodups (this will relaunch all dedup jobs); "final" is for when merged_nodups is there and you want the stats and hic files; "postproc" is for when you have the hic files and just want feature annotations; and "early" is for early exit, before hic file creation. If your jobs failed at the alignment stage, run relaunch_prep.sh and then run juicer.sh.
  • Miscelleaneous options include -a 'experiment description', which will add the experiment description to the statistics file and the meta data in the hic file; -r, which allows you to use bwa aln instead of bwa mem, useful for shorter reads; -R [end], in case you have one read end that's short and one that's long and you want to align the short end with bwa aln and the long end with bwa mem; and -D [Juicer scripts directory], to set an alternative Juicer directory; must have scripts/, references/, and restriction_sites/ underneath it

Command Line Tools Usage

Detailed documentation about the command line tools can be found on the wiki:

To launch the command line tools, use the shell script “juicer_tools” on Unix/MacOS or type

java -jar juicer_tools.jar (command...) [flags...] <parameters...>`

There are different flavors of juicer_tools that depend on the CUDA version. If you do not use GPUs, these versions are equivalent. Otherwise, juicer_tools.X.X.jar uses CUDA version X.X

For HiCCUPS loop calling without the shell or bat script, you will need to call: java -Xms512m -Xmx2048m -Djava.library.path=path/to/natives/ -jar juicer_tools.jar hiccups [flags...] <parameters...> where path/to/natives is the path to the native libraries used for Jcuda By default, these are located in the lib/jcuda folder.

In the command line tools, there are several analysis functions:

  1. apa for conducting aggregate peak analysis
  2. hiccups for annotating loops
  3. motifs for finding CTCF motifs
  4. arrowhead for annotating contact domains
  5. eigenvector for calculating the eigenvector (first PC) of the Pearson's
  6. pearsons for calculating the Pearson's

The juicer_tools (Unix/MacOS) script can be used in place of the unwieldy java -Djava.library.path=path/to/natives/ -jar juicer_tools.jar

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].