All Projects → alastair-droop → Fqtools

alastair-droop / Fqtools

Licence: gpl-3.0
An efficient FASTQ manipulation suite

Programming Languages

c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to Fqtools

Gcp For Bioinformatics
GCP Essentials for Bioinformatics Researchers
Stars: ✭ 95 (-16.67%)
Mutual labels:  bioinformatics
Bedtk
A simple toolset for BED files (warning: CLI may change before bedtk becomes stable)
Stars: ✭ 103 (-9.65%)
Mutual labels:  bioinformatics
Pyani
Python module for average nucleotide identity analyses
Stars: ✭ 111 (-2.63%)
Mutual labels:  bioinformatics
Dnachisel
✏️ A versatile DNA sequence optimizer
Stars: ✭ 95 (-16.67%)
Mutual labels:  bioinformatics
Pymzml
pymzML - an interface between Python and mzML Mass spectrometry Files
Stars: ✭ 100 (-12.28%)
Mutual labels:  bioinformatics
Sortmerna
SortMeRNA: next-generation sequence filtering and alignment tool
Stars: ✭ 108 (-5.26%)
Mutual labels:  bioinformatics
Riddle
Race and ethnicity Imputation from Disease history with Deep LEarning
Stars: ✭ 91 (-20.18%)
Mutual labels:  bioinformatics
Ugene
UGENE is free open-source cross-platform bioinformatics software
Stars: ✭ 112 (-1.75%)
Mutual labels:  bioinformatics
Genomics
A collection of scripts and notes related to genomics and bioinformatics
Stars: ✭ 101 (-11.4%)
Mutual labels:  bioinformatics
Cgranges
A C/C++ library for fast interval overlap queries (with a "bedtools coverage" example)
Stars: ✭ 111 (-2.63%)
Mutual labels:  bioinformatics
Ariba
Antimicrobial Resistance Identification By Assembly
Stars: ✭ 96 (-15.79%)
Mutual labels:  bioinformatics
Smudgeplot
Inference of ploidy and heterozygosity structure using whole genome sequencing data
Stars: ✭ 98 (-14.04%)
Mutual labels:  bioinformatics
Taxonkit
A Practical and Efficient NCBI Taxonomy Toolkit
Stars: ✭ 109 (-4.39%)
Mutual labels:  bioinformatics
Nextflow
A DSL for data-driven computational pipelines
Stars: ✭ 1,337 (+1072.81%)
Mutual labels:  bioinformatics
Biofast
Benchmarking programming languages/implementations for common tasks in Bioinformatics
Stars: ✭ 112 (-1.75%)
Mutual labels:  bioinformatics
Fastqt
FastQC port to Qt5: A quality control tool for high throughput sequence data.
Stars: ✭ 92 (-19.3%)
Mutual labels:  bioinformatics
Indra
INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
Stars: ✭ 105 (-7.89%)
Mutual labels:  bioinformatics
Bio4j
Bio4j abstract model and general entry point to the project
Stars: ✭ 113 (-0.88%)
Mutual labels:  bioinformatics
Bioconvert
Bioconvert is a collaborative project to facilitate the interconversion of life science data from one format to another.
Stars: ✭ 112 (-1.75%)
Mutual labels:  bioinformatics
Pegasus
Pegasus Workflow Management System - Automate, recover, and debug scientific computations.
Stars: ✭ 110 (-3.51%)
Mutual labels:  bioinformatics

Introduction

fqtools is a software suite for fast processing of FASTQ files. Various file manipulations are supported. See below for a full list of the subcommands available and a brief description of their purpose. Most of the individual subcommands will take either a single file or a pair of files as input. If no input file is specified, fqtools will attempt to read data from stdin. In this case, it is advisabe to specify the format of the data provided. For subcommands that generate FASTQ data, either a single file or a pair of files will be generated. If no -o argument is provided, single files will be writted to stdout.

Citation

If you use fqtools in pblished work, please can you include a reference to my Bioinformatics paper:

  • Droop, A. P. (2016). fqtools: An efficient software suite for modern FASTQ file manipulation. Bioinformatics (Oxford, England). [DOI:10.1093/bioinformatics/btw088]

Installation

fqtools requires building against both the zlib and htslib libraries:

  • zlib is required for processing compressed (.gz) data. The code relies on several recent zlib file IO functions, so must be a version >= 1.2.3.5.
  • htslib is required for reading BAM files. If htslib is not installed, download and compile htslib. Then, alter the HTSDIR path in the fqtools Makefile to point to the htslib source directory.

If ZLib is already installed, building can be performed similar to the following:

git clone https://github.com/alastair-droop/fqtools
cd fqtools/
git clone https://github.com/samtools/htslib
cd htslib/
autoheader
autoconf 
./configure
make
make install
cd ..
make

You might need to run the make install as sudo make install. The htslib library must be installed into a location that the built fqtools program can find (as fqtools executable is dynamically linked to the htslib library). So, if you can not (or do not want to) install HTSlib, you must add the location of the libhts.so file to your LD_LIBRARY_PATH variable.

Licence

fqtools is released under the GNU General Public License version 3.

Subcommands

The fqtools suite contains the following subcommands:

  • view View FASTQ files
  • head View the first reads in FASTQ files
  • count Count FASTQ file reads
  • header View FASTQ file header data
  • sequence View FASTQ file sequence data
  • quality View FASTQ file quality data
  • header2 View FASTQ file secondary header data
  • fasta Convert FASTQ files to FASTA format
  • basetab Tabulate FASTQ base frequencies
  • qualtab Tabulate FASTQ quality character frequencies
  • type Attempt to guess the FASTQ quality encoding type
  • validate Validate FASTQ files
  • find Find FASTQ reads containing specific sequences
  • trim Trim reads in a FASTQ file
  • qualmap Translate quality values using a mapping file

Each subcommand has its own set of arguments. The global arguments are:

  • -h Show this help message and exit.
  • -v Show the program version and exit.
  • -d Allow DNA sequence bases (ACGTN)
  • -r Allow RNA sequence bases (ACGUN)
  • -a Allow ambiguous sequence bases (RYKMSWBDHV)
  • -m Allow mask sequence base (X)
  • -u Allow uppercase sequence bases
  • -l Allow lowercase sequence bases
  • -p CHR Set the pair replacement character (default "%")
  • -b BUFSIZE Set the input buffer size
  • -B BUFSIZE Set the output buffer size
  • -q QUALTYPE Set the quality score encoding
  • -f FORMAT Set the input file format
  • -F FORMAT Set the output file format
  • -i Read interleaved input file pairs
  • -I Write interleaved output file pairs

CHR

This character will be replaced by the pair value when writing paired files.

BUFSIZE

Possible suffixes are [bkMG]. If no suffix is given, value is in bytes.

QUALTYPE

  • u Do not assume specifc quality score encoding
  • s Interpret quality scores as Sanger encoded
  • o Interpret quality scores as Solexa encoded
  • i Interpret quality scores as Illumina encoded

FORMAT

  • F uncompressed FASTQ format (.fastq)
  • f compressed FASTQ format (.fastq.gz)
  • b unaligned BAM format (.bam)
  • u attempt to infer format from file extension, (default .fastq.gz)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].