All Projects → Boyle-Lab → Blacklist

Boyle-Lab / Blacklist

Licence: gpl-3.0
Application for making ENCODE Blacklists

Projects that are alternatives of or similar to Blacklist

Pymzml
pymzML - an interface between Python and mzML Mass spectrometry Files
Stars: ✭ 100 (-15.97%)
Mutual labels:  bioinformatics
Pyani
Python module for average nucleotide identity analyses
Stars: ✭ 111 (-6.72%)
Mutual labels:  bioinformatics
Cooler
A cool place to store your Hi-C
Stars: ✭ 112 (-5.88%)
Mutual labels:  bioinformatics
Bedtk
A simple toolset for BED files (warning: CLI may change before bedtk becomes stable)
Stars: ✭ 103 (-13.45%)
Mutual labels:  bioinformatics
Pegasus
Pegasus Workflow Management System - Automate, recover, and debug scientific computations.
Stars: ✭ 110 (-7.56%)
Mutual labels:  bioinformatics
Bioconvert
Bioconvert is a collaborative project to facilitate the interconversion of life science data from one format to another.
Stars: ✭ 112 (-5.88%)
Mutual labels:  bioinformatics
Bionitio
Demonstrating best practices for bioinformatics command line tools
Stars: ✭ 97 (-18.49%)
Mutual labels:  bioinformatics
Dna2vec
dna2vec: Consistent vector representations of variable-length k-mers
Stars: ✭ 117 (-1.68%)
Mutual labels:  bioinformatics
Cgranges
A C/C++ library for fast interval overlap queries (with a "bedtools coverage" example)
Stars: ✭ 111 (-6.72%)
Mutual labels:  bioinformatics
Fqtools
An efficient FASTQ manipulation suite
Stars: ✭ 114 (-4.2%)
Mutual labels:  bioinformatics
Indra
INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
Stars: ✭ 105 (-11.76%)
Mutual labels:  bioinformatics
Taxonkit
A Practical and Efficient NCBI Taxonomy Toolkit
Stars: ✭ 109 (-8.4%)
Mutual labels:  bioinformatics
Ugene
UGENE is free open-source cross-platform bioinformatics software
Stars: ✭ 112 (-5.88%)
Mutual labels:  bioinformatics
Genomics
A collection of scripts and notes related to genomics and bioinformatics
Stars: ✭ 101 (-15.13%)
Mutual labels:  bioinformatics
Apbs Pdb2pqr
APBS - software for biomolecular electrostatics and solvation
Stars: ✭ 114 (-4.2%)
Mutual labels:  bioinformatics
Smudgeplot
Inference of ploidy and heterozygosity structure using whole genome sequencing data
Stars: ✭ 98 (-17.65%)
Mutual labels:  bioinformatics
Biofast
Benchmarking programming languages/implementations for common tasks in Bioinformatics
Stars: ✭ 112 (-5.88%)
Mutual labels:  bioinformatics
Hicexplorer
HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
Stars: ✭ 116 (-2.52%)
Mutual labels:  bioinformatics
Ngless
NGLess: NGS with less work
Stars: ✭ 115 (-3.36%)
Mutual labels:  bioinformatics
Bio4j
Bio4j abstract model and general entry point to the project
Stars: ✭ 113 (-5.04%)
Mutual labels:  bioinformatics

DOI

The ENCODE Blacklist: Identification of Problematic Regions of the Genome

Functional genomics assays based on high-throughput sequencing greatly expand our ability to understand the genome. Here, we define the ENCODE blacklist- a comprehensive set of regions in the human, mouse, worm, and fly genomes that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The removal of the ENCODE blacklist is an essential quality measure when analyzing functional genomics data.

If you use the blacklists, please cite:

Amemiya, H.M., Kundaje, A. & Boyle, A.P. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci Rep 9, 9354 (2019). https://doi.org/10.1038/s41598-019-45839-z

Available Blacklists

For those interested in using the blacklists, a current version for dm3, dm6, ce10, ce11, mm10, hg19, and hg38 are available in the lists/ folder.

System Requirements

Hardware Requirements

Generation of the Blacklist requires a significant amount of RAM and disk storage based on the size of the genome analyzed and the number of input data files being processed. For minimal performance, we recommend a computer with the following specs:

RAM: 64+ GB
CPU: 24+ cores, 3.4+ GHz/core

The runtime on this minimal system is approximately 192 CPU hours. Compile time is approximately 1.1 seconds.

Software Requirements

The package development version is tested on Linux operating systems. The developmental version of the package has been tested on the following systems:

Linux: Ubuntu 18.04

Demo

We include a small demo file of an unmapped chromosome from mm10 (chrUn_GL456392). Execution time of this demo is approximately 0.025 seconds. The expected output is a bed annotation of a abnormal region across the entire segment:

cd demo
./Blacklist chrUn_GL456392
chrUn_GL456392	5200	23600	High Signal Region

Installation

Clone a copy of the Blacklist repository and submodules:

git clone --recurse-submodules https://github.com/Boyle-Lab/Blacklist.git

Build bamtools API (please see bamtools documentation for more information) Note: bamtools requires zlib to be installed

cd Blacklist/bamtools/
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX:PATH=$(cd ..; pwd)/install ..
make
make install
cd ../..

Build Blacklist

make

The blacklist software relies on a certain directory structure relative to the executable to function properly. All input data tracks should sorted and indexed bam files.

  • Blacklist execuatable
    • input/ - folder containing all bam and bam.bai files
    • mappability/ - folder containing all uint8 Umap files

Usage information

The blacklist is built on a per-chromosome or contig level. The following example will build a blacklist for a contig labeled chr1 and output the regions to chr1.bed:

./Blacklist chr1 > chr1.bed

Historical blacklist information

(these lists are also available in the lists/ folder)

ENCODE

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].