All Projects → walaj → bxtools

walaj / bxtools

Licence: MIT license
Tools for analyzing 10X Genomics data

Programming Languages

Makefile
30231 projects
shell
77523 projects
C++
36643 projects - #6 most used programming language
c
50402 projects - #5 most used programming language
M4
1887 projects

Projects that are alternatives of or similar to bxtools

mlst check
Multilocus sequence typing by blast using the schemes from PubMLST
Stars: ✭ 22 (-43.59%)
Mutual labels:  genomics, sequencing
Deepvariant
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
Stars: ✭ 2,404 (+6064.1%)
Mutual labels:  genomics, sequencing
Artemis
Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation
Stars: ✭ 135 (+246.15%)
Mutual labels:  genomics, sequencing
Ariba
Antimicrobial Resistance Identification By Assembly
Stars: ✭ 96 (+146.15%)
Mutual labels:  genomics, sequencing
bac-genomics-scripts
Collection of scripts for bacterial genomics
Stars: ✭ 39 (+0%)
Mutual labels:  genomics, sequencing
Genomics
A collection of scripts and notes related to genomics and bioinformatics
Stars: ✭ 101 (+158.97%)
Mutual labels:  genomics, sequencing
Roary
Rapid large-scale prokaryote pan genome analysis
Stars: ✭ 176 (+351.28%)
Mutual labels:  genomics, sequencing
Sns
Analysis pipelines for sequencing data
Stars: ✭ 43 (+10.26%)
Mutual labels:  genomics, sequencing
HLA
xHLA: Fast and accurate HLA typing from short read sequence data
Stars: ✭ 84 (+115.38%)
Mutual labels:  genomics, sequencing
Htsjdk
A Java API for high-throughput sequencing data (HTS) formats.
Stars: ✭ 220 (+464.1%)
Mutual labels:  genomics, sequencing
Genomicsqlite
Genomics Extension for SQLite
Stars: ✭ 90 (+130.77%)
Mutual labels:  genomics, sequencing
gubbins
Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
Stars: ✭ 103 (+164.1%)
Mutual labels:  genomics, sequencing
Fastq.bio
An interactive web tool for quality control of DNA sequencing data
Stars: ✭ 76 (+94.87%)
Mutual labels:  genomics, sequencing
Circlator
A tool to circularize genome assemblies
Stars: ✭ 121 (+210.26%)
Mutual labels:  genomics, sequencing
Gubbins
Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
Stars: ✭ 67 (+71.79%)
Mutual labels:  genomics, sequencing
Hgvs
Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
Stars: ✭ 138 (+253.85%)
Mutual labels:  genomics, sequencing
Awesome Sequencing Tech Papers
A collection of publications on comparison of high-throughput sequencing technologies.
Stars: ✭ 21 (-46.15%)
Mutual labels:  genomics, sequencing
Gatk
Official code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+2469.23%)
Mutual labels:  genomics, sequencing
Sequenceserver
Intuitive local web frontend for the BLAST bioinformatics tool
Stars: ✭ 198 (+407.69%)
Mutual labels:  genomics, sequencing
snp-sites
Finds SNP sites from a multi-FASTA alignment file
Stars: ✭ 182 (+366.67%)
Mutual labels:  genomics, sequencing

Build Status

bxtools - Tools for analyzing 10X genomics data

License: MIT

Note: bxtools is an emerging project. If you find an operation that you need that may be in the scope of bxtools, please submit an issue report or pull request with the suggested functionality. We are looking for community suggestions for what we might include.

Table of contents

Installation

git clone --recursive https://github.com/walaj/bxtools
cd bxtools
./configure
make 
make install

Description

bxtools is a set of light-weight command line tools for analyzing 10X genomics data. It is built to take care of low-level type operations in a 10X-specific way by accounting for the BX tag in 10X data.

Components

Split

Split a BAM file by the BX tag.

## split a BAM into individual BAMs (called test.<bx>.bam). Don't output tags with < 10 reads
bxtools split $bam -a test -m 10 > counts.tsv

## split a portion of a BAM 
samtools view -h $bam 1:1,000,000-2,000,000 | bxtools split - -a test > counts.tsv

## just get the BX counts and sort by prevalence
bxtools split $bam -x | sort -n -k 2,2 > counts.tsv

Stats

Collect BX-level statistics from a 10X BAM

bxtools stats $bam > stats.tsv
## output columns: BX, read count, median insert size, median mapq, median AS.

To summarize based on another tag, use -t. E.g. : bxtools stats -t MI $bam

Tile

Collect BX-level read counts on a tiled genome

## default is 1kb tiles, across entire genome
bxtools tile $bam > counts.bed

## input bed to check (e.g. chr1 only)
samtools view -h $bam 1:1-250,000,000 | bxtools tile - -b chr1.tiles.bed > chr1.tiles.counts.bed

Relabel

Move the BX barcodes from the BX tag (e.g. BX:ACTTACCGA) to the read name (e.g. qname_ACTTACCGA)

VERBOSE=-v ## print progress
bxtools relabel $bam $VERBOSE > relabeled.bam

Mol

Get the minimum molecular footprint on the genome as BED file for each MI tag. The minimal footprint is defined from the minimum start position to the maximum end position of all reads sharing an MI tag. Throws an error message if detects the same MI tag on multiple chromosomes.

The output BED format is chr, start, end, MI, BX, read_count

bxtools mol $bam > mol_footprint.bed

Convert

Switch the alignment chromosome with the BX tag. This is a hack to allow a 10X BAM to be sorted and indexed by BX tag, rather than coordinate. Useful for rapid lookup of all BX reads from a particular BX. Note that this switches "-" for "_" to make query possible with samtools view. This also requires a two-pass solution. The first loop is to get all of the unique BX tags to build the new BAM header. The second makes the switches. This means that streaming from stdin is not available.

bxtools convert $bam | samtools sort - -o bx_sorted.bam
samtools index bx_sorted.bam
samtools view AGTCCAAGTCGGAAGT_1

Example recipes

Get BX level coverage in 2kb bins across genome, ignore low-frequency tags

## make a list of bad tags (freq < 100)
samtools view -h $bam 1:1-10,000,000 | bxtools split - -x | awk '$2 < 100' | cut -f1 > excluded_list.txt

## get the coverage, while excluding bad tags (grep: -F literal, -f file, -v exclude)
samtools view -h $bam 1:1-10,000,000 | grep -v -F -f excluded_list.txt | bxtools tile - -w 2000 > bxcov.bed

Attributions

This project is developed and maintained by Jeremiah Wala ([email protected])

Analysis suggestions and 10X support

  • Tushar Kamath - MD-PhD Student, Harvard Medical School
  • Gavin Ha - Postdoctoral Fellow, Broad Institute
  • Srinivas Viswanathan - Oncology Fellow, Dana Farber Cancer Institute
  • Chris Whelan - Computational Biologist, Broad Institute
  • Cheng-Zhong Zhang - Assistant Professor, Dana Farber Cancer Institute
  • Marcin Imielinski - Assistant Professor, Weill Cornell Medical College
  • Rameen Beroukhim - Assistant Professor, Dana Farber Cancer Institute
  • Matthew Meyerson - Professor, Dana Farber Cancer Institute
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].