Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

is an open-source LIMS (laboratory Information Management System) for Next Generation Sequencing sample management, statistics and reports, and bioinformatics analysis service management.

Stars: ✭ 33 (+450%)

Mutual labels: ngs

peppy

Project metadata manager for PEPs in Python

Stars: ✭ 29 (+383.33%)

Mutual labels: ngs

readfq

A simple tool to calculate reads number and total base count in FASTQ file

Stars: ✭ 19 (+216.67%)

Mutual labels: ngs

Htslib

C library for high-throughput sequencing data formats

Stars: ✭ 529 (+8716.67%)

Mutual labels: ngs

OpenGene.jl

(No maintenance) OpenGene, core libraries for NGS data analysis and bioinformatics in Julia

Stars: ✭ 60 (+900%)

Mutual labels: ngs

ctdna-pipeline

A simplified pipeline for ctDNA sequencing data analysis

Stars: ✭ 29 (+383.33%)

Mutual labels: ngs

atropos

An NGS read trimming tool that is specific, sensitive, and speedy. (production)

Stars: ✭ 109 (+1716.67%)

Mutual labels: ngs

SVCollector

Method to optimally select samples for validation and resequencing

Stars: ✭ 20 (+233.33%)

Mutual labels: ngs

gencore

Generate duplex/single consensus reads to reduce sequencing noises and remove duplications

Stars: ✭ 91 (+1416.67%)

Mutual labels: ngs

MTBseq source

MTBseq is an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from illumina whole genome sequence data of Mycobacterium tuberculosis complex isolates.

Stars: ✭ 26 (+333.33%)

Mutual labels: ngs

Jvarkit

Java utilities for Bioinformatics

Stars: ✭ 313 (+5116.67%)

Mutual labels: ngs

PHAT

Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform

Stars: ✭ 17 (+183.33%)

Mutual labels: ngs

fastq utils

Validation and manipulation of FASTQ files, scRNA-seq barcode pre-processing and UMI quantification.

Stars: ✭ 25 (+316.67%)

Mutual labels: ngs

Galaxy

Data intensive science for everyone.

Stars: ✭ 812 (+13433.33%)

Mutual labels: ngs

Deeptools

Tools to process and analyze deep sequencing data.

Stars: ✭ 448 (+7366.67%)

Mutual labels: ngs

DNAscan

DNAscan is a fast and efficient bioinformatics pipeline that allows for the analysis of DNA Next Generation sequencing data, requiring very little computational effort and memory usage.

Stars: ✭ 36 (+500%)

Mutual labels: ngs

View All Similar Projects ➔

ngsDist

ngsDist is a program to estimate pairwise genetic distances directly, taking the uncertainty of genotype's assignation into account. It does so by avoiding genotype calling and using genotype likelihoods or posterior probabilities.

Citation

ngsDist was published in 2015 at Biological Journal of the Linnean Society, so please cite it if you use it in your work:

Vieira FG, Lassalle F, Korneliussen TS, Fumagalli M
Improving the estimation of genetic distances from Next-Generation Sequencing data
Biological Journal of the Linnean Society (2015) doi: 10.1111/bij.12511

Installation

ngsDist can be easily installed but has some external dependencies:

Mandatory:
- gcc: >= 4.9.2 tested on Debian 7.8 (wheezy)
- zlib: v1.2.7 tested on Debian 7.8 (wheezy)
- gsl : v1.15 tested on Debian 7.8 (wheezy)
Optional (only needed for testing or auxilliary scripts):
- md5sum

To install the entire package just download the source code:

% git clone https://github.com/fgvieira/ngsDist.git

and run:

% cd ngsDist
% make

To run the tests (only if installed through ngsTools):

% make test

Executables are built into the main directory. If you wish to clean all binaries and intermediate files:

% make clean

Usage

% ./ngsDist [options] --geno /path/to/input/file --n_ind INT --n_sites INT --out /path/to/output/file

Parameters

--geno FILE: input file with genotypes, genotype likelihoods or genotype posterior probabilities.
--n_ind INT: sample size (number of individuals).
--n_sites INT: number of sites in input file.
--tot_sites INT: total number of sites in dataset.
--labels FILE: labels, one per line, of the input sequences.
--probs: is the input genotype probabilities (likelihoods or posteriors)?
--log_scale: Ii the input in log-scale?.
--call_geno: call genotypes before running analyses.
--N_thresh DOUBLE: minimum threshold to consider site; missing data if otherwise (assumes -call_geno)
--call_thresh DOUBLE: minimum threshold to call genotype; left as is if otherwise (assumes -call_geno)
--pairwise_del: pairwise deletion of missing data.
--avg_nuc_dist: use average number of nucleotide differences as distance (by default, ngsDist uses genotype distances based on allele frequency differences). Only pairs of heterozygous positions are actually affected when using this option, with their distance being 0.5 (instead of 0 by default).
--indep_geno: assume independence between genotypes?
--n_boot_rep INT: number of bootstrap replicates [0].
--boot_block_size INT: block size (in alignment positions) for bootstrapping [1].
--out FILE: output file name.
--n_threads INT: number of threads to use. [1]
--verbose INT: selects verbosity level. [1]
--seed INT: random number generator seed (only for the bootstrap analysis).

Input data

As input, ngsDist accepts both genotypes, genotype likelihoods (GL) or genotype posterior probabilities (GP). Genotypes must be input as gziped TSV with one row per site and one column per individual and genotypes coded as [-1, 0, 1, 2]. The file can have a header and an arbitrary number of columns preceeding the actual data (that will all be ignored), much like the Beagle file format (link). As for GL and GP, ngsDist accepts both gzipd TSV and binary formats, but with 3 columns per individual and, in the case of binary, the GL/GP coded as doubles.

Evolutionary models

ngsDist calculates a "p-distance", being its biggest strength the possibility of taking genotype uncertainty (from genotype likelihoods) into account. It currently does not use any evolutionary model (e.g. JC, K2P), but it is something that could be added in the future.

Bootstrap Trees

If you want branch support values on your tree, you can use ngsDist with the option --n_boot_rep and --boot_block_size to bootstrap the input data. ngsDist will output one distance matrix (the first) for the input full dataset, plus --n_boot_rep matrices for each of the bootstrap replicates. After, infer a tree for each of the matrices using the program of your choice and plot them. For example, using FastME on a dataset with 5 bootstrap replicates:

fastme -T 20 -i testA_8B.dist -s -D 6 -o testA_8B.nwk

split the input dataset tree from the bootstraped ones:

head -n 1 testA_8B.nwk > testA_8B.main.nwk
tail -n +2 testA_8B.nwk | awk 'NF' > testA_8B.boot.nwk

and, to place supports on the main tree, use RAxML:

raxmlHPC -f b -t testA_8B.main.nwk -z testA_8B.boot.nwk -m GTRCAT -n testA_8B

or RAxML-NG:

raxml-ng --support --tree testA_8B.main.nwk --bs-trees testA_8B.boot.nwk --prefix testA_8B

Thread pool

The thread pool implementation was adapted from Mathias Brossard's and is freely available from: https://github.com/mbrossard/threadpool

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 6

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗