All Projects → akotlar → bystro

akotlar / bystro

Licence: Apache-2.0 License
Bystro genetic analysis (annotation, filtering, statistics)

Programming Languages

perl
6916 projects
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to bystro

tiptoft
Predict plasmids from uncorrected long read data
Stars: ✭ 27 (-12.9%)
Mutual labels:  bioinformatics, genomics, bioinformatics-pipeline
pyrpipe
Reproducible bioinformatics pipelines in python. Import any Unix tool/command in python.
Stars: ✭ 53 (+70.97%)
Mutual labels:  bioinformatics, bioinformatics-pipeline, bioinformatics-analysis
plasmidtron
Assembling the cause of phenotypes and genotypes from NGS data
Stars: ✭ 27 (-12.9%)
Mutual labels:  bioinformatics, genomics, bioinformatics-pipeline
gff3toembl
Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI
Stars: ✭ 27 (-12.9%)
Mutual labels:  bioinformatics, genomics, bioinformatics-pipeline
saffrontree
SaffronTree: Reference free rapid phylogenetic tree construction from raw read data
Stars: ✭ 17 (-45.16%)
Mutual labels:  bioinformatics, genomics, bioinformatics-pipeline
sequencework
programs and scripts, mainly python, for analyses related to nucleic or protein sequences
Stars: ✭ 22 (-29.03%)
Mutual labels:  genomics, bioinformatics-analysis, bioinformatics-scripts
companion
This repository has been archived, currently maintained version is at https://github.com/iii-companion/companion
Stars: ✭ 21 (-32.26%)
Mutual labels:  bioinformatics, genomics
PHAT
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Stars: ✭ 17 (-45.16%)
Mutual labels:  bioinformatics, bioinformatics-analysis
calN50
Compute N50/NG50 and auN/auNG
Stars: ✭ 20 (-35.48%)
Mutual labels:  bioinformatics, genomics
wgs2ncbi
Toolkit for preparing genomes for submission to NCBI
Stars: ✭ 25 (-19.35%)
Mutual labels:  bioinformatics, genomics
unimap
A EXPERIMENTAL fork of minimap2 optimized for assembly-to-reference alignment
Stars: ✭ 76 (+145.16%)
Mutual labels:  bioinformatics, genomics
dysgu
dysgu-SV is a collection of tools for calling structural variants using short or long reads
Stars: ✭ 47 (+51.61%)
Mutual labels:  bioinformatics, genomics
Scaff10X
Pipeline for scaffolding and breaking a genome assembly using 10x genomics linked-reads
Stars: ✭ 21 (-32.26%)
Mutual labels:  bioinformatics, genomics
faster lmm d
A faster lmm for GWAS. Supports GPU backend.
Stars: ✭ 12 (-61.29%)
Mutual labels:  bioinformatics, genomics
catch
A package for designing compact and comprehensive capture probe sets.
Stars: ✭ 55 (+77.42%)
Mutual labels:  bioinformatics, genomics
jgi-query
A simple command-line tool to download data from Joint Genome Institute databases
Stars: ✭ 38 (+22.58%)
Mutual labels:  bioinformatics, genomics
open-cravat
A modular annotation tool for genomic variants
Stars: ✭ 74 (+138.71%)
Mutual labels:  genomics, bioinformatics-pipeline
simplesam
Simple pure Python SAM parser and objects for working with SAM records
Stars: ✭ 50 (+61.29%)
Mutual labels:  bioinformatics, genomics
ntHash
Fast hash function for DNA sequences
Stars: ✭ 66 (+112.9%)
Mutual labels:  bioinformatics, genomics
L1000-bayesian
L1000 peak deconvolution based on Bayesian analysis
Stars: ✭ 18 (-41.94%)
Mutual labels:  bioinformatics-databases, bioinformatics-algorithms

Bystro DOI Codacy Badge

Bystro Publication

For datasets and scripts used, please visit github.com/bystro-paper

If using Bystro, please cite Kotlar et al, Genome Biology, 2018

Web Tutorial

Start here: TUTORIAL.md

For most users, we recommend https://bystro.io .

The web app gives full access to all of Bystro's capabilities, provides a convenient search/filtering interface, supports large data sets (tested up to 890GB uncompressed/129GB compressed), and has excellent performance.

Installing Bystro

Check out the master branch for the upcoming release

The easiest way is to run from Docker: docker pull akotlar/bystro:latest && docker run bystro:latest bystro-annotate.pl

Please read: INSTALL.md for instructions on how to download and use Bystro hg19/hg38/etc databases.

Bystro relies on pluggable (via Bystro's YAML config) pre-processors to normalize variant inputs (dealing with VCF issues such as padding), calculate whether a site is a transition or transversion, calculate sample maf, identify hets/homozygotes/missing samples, calculate heterozygosity, homozygosity, missingness, and more.

  1. VCF format: Bystro-Vcf
  2. SNP format: Bystro-SNP
  3. Create your own to support other formats!

Annotation (Output) Field Descriptions

Please read FIELDS.md

The Bystro configuration file

  • The config file describes the state of both the database and the annotation. It's required for annotating or building
  • It has several keys:
    • tracks: The highest level organization for database values. Tracks have a name property, which must be unique, and a type, which must be one of:

      • sparse: Any bed file, or any file that can be mapped to chrom, chromStart, and chromEnd columns.
        • This is used for dbSNP, and Clinvar records, but many files can be fit this format.
        • Mapping fields can be managed by the fieldMap key
      • score: Accepts any wigFix file.
        • Used for phastCons, phyloP
      • cadd:
        • Accepts any CADD file, or Bystro's custom "bed-like" CADD file, which has 2 header lines, and chrom, chromStart, chromEnd columns, followed by standard CADD fields
      • gene: A UCSC gene track field (ex: knownGene, refGene, sgdGene).
        • The local_files for this are created using an sql_statement
        • Ex: SELECT * FROM hg38.refGene LEFT JOIN hg38.kgXref ON hg38.kgXref.refseq = hg38.refGene.name
    • chromosomes: The allowable chromosomes.

      • Each row of every track must be identified by these chromosomes (during building)
      • Each row of any input file submitted for annotation must also be "" "" (during annotation)
      • However, Bystro is flexible about the chr prefix

      Ex: For the following config

      chromosomes:
      - chr1
      - chr2
      - chr3

      Only chr1, chr2, and chr3 will be accepted. However, Bystro tries to make your life easy

      1. We currently follow UCSC conventions for chromosomes, meaning they should be prepended by chr
      2. Bystro will automatically append chr to chromosomes read from an input file during annotation.
      3. Bystro allows the transformation of any field during building, configurable in the YAML config file for that assembly, making it easy to prepend chr to the source file chromosome field

      Ex: Clinvar doesn't have a chr prefix, so during building we specify:

      tracks:
        - name: clinvar
          build_field_transformations:
            chrom: chr .
          fieldMap:
            Chromosome: chrom

      Here fieldMap allows us to rename header fields, and build_field_transformations allows us to define a prepend operation (chr . can be interpreted as the perl command "chr" . $chrom)

      So: input files do not need to have their chromosomes prepended by chr. Bystro will normalize the name.

      In this example chromosomes 1 and chr1 will be built/annotated, but 1_rand will not.

Directories and Files

These describe where the Bystro database and any source files are located.

  1. files_dir : The parent folder within which each track's local_files are located
  • Bystro automatically checks for local_files at parent/trackName/file

    Ex: For the config file containing

    files_dir: /path/to/files/
    track:
      - name: refSeq
        local_files:
          - hg19.refGene.chr1.gz
          # and more files

    Bystro will expect files in /path/to/files/refSeq/hg19.refGene.chr1.gz

  1. database_dir : Each database is held within database_dir, in a folder of the name assembly

    Ex: For the config file containing

    assembly: hg19
    database_dir: /path/to/databases/

    Bystro will look for the database /path/to/databases/hg19

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].