All Projects → brentp → bigly

brentp / bigly

Licence: Apache-2.0 license
a pileup library that embraces the huge

Programming Languages

go
31211 projects - #10 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to bigly

MGSE
Mapping-based Genome Size Estimation (MGSE) performs an estimation of a genome size based on a read mapping to an existing genome sequence assembly.
Stars: ✭ 22 (-42.11%)
Mutual labels:  genomics
viGEN
viGEN - A bioinformatics pipeline for the exploration of viral RNA in human NGS data
Stars: ✭ 24 (-36.84%)
Mutual labels:  genomics
biopython-coronavirus
Biopython Jupyter Notebook tutorial to characterize a small genome
Stars: ✭ 80 (+110.53%)
Mutual labels:  genomics
cerebra
A tool for fast and accurate summarizing of variant calling format (VCF) files
Stars: ✭ 55 (+44.74%)
Mutual labels:  genomics
assigner
Population assignment analysis using R
Stars: ✭ 17 (-55.26%)
Mutual labels:  genomics
genipe
Genome-wide imputation pipeline
Stars: ✭ 28 (-26.32%)
Mutual labels:  genomics
kmer-db
Kmer-db is a fast and memory-efficient tool for large-scale k-mer analyses (indexing, querying, estimating evolutionary relationships, etc.).
Stars: ✭ 68 (+78.95%)
Mutual labels:  genomics
bac-genomics-scripts
Collection of scripts for bacterial genomics
Stars: ✭ 39 (+2.63%)
Mutual labels:  genomics
wgd
Python package and CLI for whole-genome duplication related analyses
Stars: ✭ 68 (+78.95%)
Mutual labels:  genomics
PHIST
Phage-Host Interaction Search Tool
Stars: ✭ 19 (-50%)
Mutual labels:  genomics
metaRNA
Find target sites for the miRNAs in genomic sequences
Stars: ✭ 19 (-50%)
Mutual labels:  genomics
sequencework
programs and scripts, mainly python, for analyses related to nucleic or protein sequences
Stars: ✭ 22 (-42.11%)
Mutual labels:  genomics
bfc
High-performance error correction for Illumina resequencing data
Stars: ✭ 66 (+73.68%)
Mutual labels:  genomics
GenomicsDB
Highly performant data storage in C++ for importing, querying and transforming variant data with C/C++/Java/Spark bindings. Used in gatk4.
Stars: ✭ 77 (+102.63%)
Mutual labels:  genomics
gawn
Genome Annotation Without Nightmares
Stars: ✭ 35 (-7.89%)
Mutual labels:  genomics
berokka
🍊 💫 Trim, circularise and orient long read bacterial genome assemblies
Stars: ✭ 23 (-39.47%)
Mutual labels:  genomics
ezancestry
Easy genetic ancestry predictions in Python
Stars: ✭ 38 (+0%)
Mutual labels:  genomics
snp-sites
Finds SNP sites from a multi-FASTA alignment file
Stars: ✭ 182 (+378.95%)
Mutual labels:  genomics
sample
Performs memory-efficient reservoir sampling on very large input files delimited by newlines
Stars: ✭ 61 (+60.53%)
Mutual labels:  genomics
simuG
simuG: a general-purpose genome simulator
Stars: ✭ 68 (+78.95%)
Mutual labels:  genomics

bigly: a pileup library that embraces the huge

Build Status GoDoc

bigly is an API and a command-line app (binaries here). It is similar to samtools mpileup but it reports a huge number of additional variables that are useful for structural variant calling and visualization.

For each requested position, the struct below is filled by the appropriate position in any overlapping alignment that meets the requested filters:

// Pile holds the information about a single base.
type Pile struct {
	Chrom                  string
	Pos                    int
	Depth                  int    // count of reads passing filters.
	RefBase                byte   // Reference base a this position.
	MisMatches             uint32 // number of mismatches .
	ProperPairs            int    // count of reads with paired flag
	SoftStarts             uint32 // counts of base preceding an 'S' cigar op
	SoftEnds               uint32 // ...  following ...
	HardStarts             uint32 // counts of base preceding an 'H' cigar op
	HardEnds               uint32
	InsertionStarts        uint32 // counts of base preceding an 'I' cigar op
	InsertionEnds          uint32
	Deletions              uint32  // counts of deletions 'D' at this base
	Heads                  uint32  // counts of starts of reads at this base
	Tails                  uint32  // counts of ends of reads at this base
	Splitters              uint32  // count of non-secondary reads with SA tags.
	Splitters1             uint32  // count of non-secondary reads with exactly 1 SA tag.
	Bases                  []byte  // All bases from reads covering this position
	Quals                  []uint8 // All quals from reads covering this position
	MeanInsertSizeLP       uint32  // Calculated with left-most of pair
	MeanInsertSizeRM       uint32  // Calculated with right-most of pair
	OrientationPlusPlus    uint32  // Paired reads mapped in +/+ orientation
	OrientationMinusMinus  uint32  // Paired reads mapped in -/- orientation
	OrientationMinusPlus   uint32  // Paired reads mapped in -/+ orientation
    OrientationSplitter    uint32  // Count of +/- or -/+ splitters.
	Discordant             uint32  // Number of reads with insert size > ConcordantCutoff
	DiscordantChrom        uint32  // Number of reads mapping on different chroms
	DiscordantChromEntropy float32 // high value means all discordants came from same chrom.
	GC65                   uint32
	GC257                  uint32
	Duplicity65            float32 // measure of lack of sequence entropy.
	Duplicity257           float32 // measure of lack of sequence entropy.
	SplitterPositions      []int
	SplitterStrings        []string
}

The program in cmd/bigly/main.go is distributed as an example program of what one can do with this library--namely make an enhanced pileup in a few lines of code.

Usage

At this time, the usage of the example program is very simple. Default exclude flags are (sam.Unmapped | sam.QCFail | sam.Duplicate)

bigly $bam $chrom:$start-$end > o

if a reference is specified with -r it will report statistics about GC content in windows surrounding each base.

help:

bigly 0.2.0
usage: bigly [--minbasequality MINBASEQUALITY] [--minmappingquality MINMAPPINGQUALITY] [--excludeflag EXCLUDEFLAG] [--includeflag INCLUDEFLAG] [--mincliplength MINCLIPLENGTH] [--includebases] [--splitterverbosity SPLITTERVERBOSITY] [--reference REFERENCE] BAMPATH REGION

positional arguments:
  bampath
  region

options:
  --minbasequality MINBASEQUALITY, -q MINBASEQUALITY
                         base quality threshold [default: 10]
  --minmappingquality MINMAPPINGQUALITY, -Q MINMAPPINGQUALITY
                         mapping quality threshold [default: 5]
  --excludeflag EXCLUDEFLAG, -F EXCLUDEFLAG
  --includeflag INCLUDEFLAG, -f INCLUDEFLAG
  --mincliplength MINCLIPLENGTH, -c MINCLIPLENGTH
                         only count H/S clips of at least this length [default: 15]
  --includebases, -b     output each base and base quality score
  --splitterverbosity SPLITTERVERBOSITY, -s SPLITTERVERBOSITY
                         0-only count; 1:count and single most frequent; 2:all SAs; 3:dont shorten positions
  --reference REFERENCE, -r REFERENCE
                         optional path to reference fasta.
  --help, -h             display this help and exit
  --version              display version and exit

API

GoDoc Documentation is here: ![GoDoc] (https://godoc.org/github.com/brentp/bigly?status.png)

With a supporting library easing regional bam queries here: ![GoDoc] (https://godoc.org/github.com/brentp/bigly/bamat?status.png)

View the tests for more examples. bigly uses biogo library for bam access.

Plotting

python scripts/plotter.py bigly.output

will make a plot with python+matplotlib on output from bigly.

An example image looks like: Example

This is a homozygous deletion easily seen from the depth, but we can see that the deletion is nicely delineated by soft-clips and we see aberrant insert sizes bounding the deletion. In most libraries, we would also see splitters flanking the region.

TODO

  • Track strand of bases.
  • Report the 5th and 95th percentile of insert size.
  • More efficient insert-size calc

Credits

Ryan Layer's svv was the inspiration for the plotter.

Obviously, samtools is the reference pileup implementation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].