stjude-rust-labs / fq

Licence: MIT license
Command line utility for manipulating Illumina-generated FastQ files.

Programming Languages

rust
11053 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to fq

saffrontree
SaffronTree: Reference free rapid phylogenetic tree construction from raw read data
Stars: ✭ 17 (-45.16%)
Mutual labels:  genomics, next-generation-sequencing
gff3toembl
Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI
Stars: ✭ 27 (-12.9%)
Mutual labels:  genomics, next-generation-sequencing
fastq utils
Validation and manipulation of FASTQ files, scRNA-seq barcode pre-processing and UMI quantification.
Stars: ✭ 25 (-19.35%)
Mutual labels:  fastq, fastq-files
mlst check
Multilocus sequence typing by blast using the schemes from PubMLST
Stars: ✭ 22 (-29.03%)
Mutual labels:  genomics, next-generation-sequencing
gubbins
Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
Stars: ✭ 103 (+232.26%)
Mutual labels:  genomics, next-generation-sequencing
cljam
A DNA Sequence Alignment/Map (SAM) library for Clojure
Stars: ✭ 85 (+174.19%)
Mutual labels:  genomics, fastq
plasmidtron
Assembling the cause of phenotypes and genotypes from NGS data
Stars: ✭ 27 (-12.9%)
Mutual labels:  genomics, next-generation-sequencing
assembly improvement
Improve the quality of a denovo assembly by scaffolding and gap filling
Stars: ✭ 46 (+48.39%)
Mutual labels:  genomics, next-generation-sequencing
snp-sites
Finds SNP sites from a multi-FASTA alignment file
Stars: ✭ 182 (+487.1%)
Mutual labels:  genomics, next-generation-sequencing
MGSE
Mapping-based Genome Size Estimation (MGSE) performs an estimation of a genome size based on a read mapping to an existing genome sequence assembly.
Stars: ✭ 22 (-29.03%)
Mutual labels:  genomics, illumina
human genomics pipeline
A Snakemake workflow to process single samples or cohorts of paired-end sequencing data (WGS or WES) using trim galore/bwa/GATK4/parabricks.
Stars: ✭ 19 (-38.71%)
Mutual labels:  genomics, illumina
workflows
Bioinformatics workflows developed for and used on the St. Jude Cloud project.
Stars: ✭ 16 (-48.39%)
Mutual labels:  genomics, next-generation-sequencing
go enrichment
Transcripts annotation and GO enrichment Fisher tests
Stars: ✭ 24 (-22.58%)
Mutual labels:  genomics
variantkey
Numerical Encoding for Human Genetic Variants
Stars: ✭ 32 (+3.23%)
Mutual labels:  genomics
SNPGenie
Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
Stars: ✭ 81 (+161.29%)
Mutual labels:  next-generation-sequencing
redundans
Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.
Stars: ✭ 90 (+190.32%)
Mutual labels:  genomics
phastaf
Identify phage regions in bacterial genomes for masking purposes
Stars: ✭ 22 (-29.03%)
Mutual labels:  genomics
MultiAssayExperiment
Bioconductor package for management of multi-assay data
Stars: ✭ 57 (+83.87%)
Mutual labels:  genomics
Clair3
Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
Stars: ✭ 119 (+283.87%)
Mutual labels:  genomics
nf-hack17-tutorial
Nextflow basic tutorial for newbie users
Stars: ✭ 32 (+3.23%)
Mutual labels:  genomics

fq

CI status

fq is a library to generate and validate FASTQ file pairs.

Install

There are different methods to install fq.

Releases

Precompiled binaries are built for modern Linux distributions (x86_64-unknown-linux-gnu), macOS (x86_64-apple-darwin), and Windows (x86_64-pc-windows-msvc). The Linux binaries require glibc 2.18+ (CentOS/RHEL 8+, Debian 8+, Ubuntu 14.04+, etc.).

Conda

fq is available via Bioconda.

$ conda install fq=0.9.1

Manual

Clone the repository and use Cargo to install fq.

$ git clone --depth 1 --branch v0.9.1 https://github.com/stjude-rust-labs/fq.git
$ cd fq
$ cargo install --locked --path .

Container image

Container images are managed by Bioconda and available through Quay.io, e.g., using Docker:

$ docker image pull quay.io/biocontainers/fq:<tag>

See the repository tags for the available tags.

Alternatively, build the development container image:

$ git clone --depth 1 --branch v0.9.1 https://github.com/stjude-rust-labs/fq.git
$ cd fq
$ docker image build --tag fq:0.9.1 .

Usage

fq provides subcommands for filtering, generating, subsampling, and validating FASTQ files.

filter

fq filter takes an allowlist of record names and filters a given FASTQ file. The result includes only the records in the allowlist.

Usage

fq-filter
Filters a FASTQ from an allowlist of names

USAGE:
    fq filter --names <path> <src>

ARGS:
    <src>    Source FASTQ

OPTIONS:
    -h, --help            Print help information
        --names <path>    Allowlist of record names
    -V, --version         Print version information

Examples

# Filters an input FASTQ using the given allowlist.
$ fq filter --names allowlist.txt in.fastq

generate

fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina.

While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome.

Usage

fq-generate
Generates a random FASTQ file pair

USAGE:
    fq generate [OPTIONS] <r1-dst> <r2-dst>

ARGS:
    <r1-dst>    Read 1 destination. Output will be gzipped if ends in `.gz`.
    <r2-dst>    Read 2 destination. Output will be gzipped if ends in `.gz`.

OPTIONS:
    -h, --help                   Print help information
    -n, --record-count <u64>     Number of records to generate [default: 10000]
        --read-length <usize>    Number of bases in the sequence [default: 101]
    -s, --seed <u64>             Seed to use for the random number generator
    -V, --version                Print version information

Examples

# Generates the default number of records, written to uncompressed files.
$ fq generate /tmp/r1.fastq /tmp/r2.fastq

# Generates FASTQ paired reads with 32 records, written to gzipped outputs.
$ fq generate --record-count 32 /tmp/r1.fastq.gz /tmp/r2.fastq.gz

lint

fq lint is a FASTQ file pair validator.

Usage

fq-lint
Validates a FASTQ file pair

USAGE:
    fq lint [OPTIONS] <r1-src> [--] [r2-src]

ARGS:
    <r1-src>    Read 1 source. Accepts both raw and gzipped FASTQ inputs.
    <r2-src>    Read 2 source. Accepts both raw and gzipped FASTQ inputs.

OPTIONS:
        --disable-validator <str>
            Disable validators by code. Use multiple times to disable more than one.

    -h, --help
            Print help information

        --lint-mode <str>
            Panic on first error or log all errors [default: panic] [possible values: panic, log]

        --paired-read-validation-level <str>
            Only use paired read validators up to a given level [default: high] [possible values:
            low, medium, high]

        --single-read-validation-level <str>
            Only use single read validators up to a given level [default: high] [possible values:
            low, medium, high]

    -V, --version
            Print version information

Validators

validate includes a set of validators that run on single or paired records. By default, records are validated with all rules, but validators can be disabled using --disable-valdiator CODE, where CODE is one of validators listed below.

Single
Code Level Name Validation
S001 low PlusLine Plus line starts with a "+".
S002 medium Alphabet All characters in sequence line are one of "ACGTN", case-insensitive.
S003 high Name Name line starts with an "@".
S004 low Complete All four record lines (name, sequence, plus line, and quality) are present.
S005 high ConsistentSeqQual Sequence and quality lengths are the same.
S006 medium QualityString All characters in quality line are between "!" and "~" (ordinal values).
S007 high DuplicateName All record names are unique.
Paired
Code Level Name Validation
P001 medium Names Each paired read name is the same, excluding interleave.

Examples

# Validate both reads using all validators. Exits cleanly (0) if no validation
# errors occur.
$ fq lint r1.fastq r2.fastq

# Log errors instead of quitting on first error.
$ fq lint --lint-mode log r1.fastq r2.fastq

# Disable validators S004 and S007.
$ fq lint --disable-validator S004 --disable-validator S007 r1.fastq r2.fastq

subsample

fq subsample outputs a subset of records from single or paired FASTQ files.

When using a probability (-p, --probability), each file is read through once, and a subset of records is selected based on that chance. Given the randomness used when sampling a uniform distribution, the output record count will not be exact but (statistically) close.

When using a record count (-n, --record-count), the first input is read twice, but it provides an exact number of records to be selected.

A seed (-s, --seed) can be provided to influence the results, e.g., for a deterministic subset of records.

For paired input, the sampling is applied to each pair.

Usage

fq-subsample
Outputs a subset of records

USAGE:
    fq subsample [OPTIONS] --probability <f64> --record-count <u64> --r1-dst <path> <r1-src> [r2-src]

ARGS:
    <r1-src>    Read 1 source. Accepts both raw and gzipped FASTQ inputs.
    <r2-src>    Read 2 source. Accepts both raw and gzipped FASTQ inputs.

OPTIONS:
    -h, --help                  Print help information
    -n, --record-count <u64>    The exact number of records to keep. Cannot be used with
                                `probability`.
    -p, --probability <f64>     The probability a record is kept, as a percentage [0, 1]. Cannot be
                                used with `record-count`.
        --r1-dst <path>         Read 1 destination. Output will be gzipped if ends in `.gz`.
        --r2-dst <path>         Read 2 destination. Output will be gzipped if ends in `.gz`.
    -s, --seed <u64>            Seed to use for the random number generator
    -V, --version               Print version information

Examples

# Sample ~50% of records from a single FASTQ file
$ fq subsample --probability 0.5 --r1-dst r1.50pct.fastq r1.fastq

# Sample ~50% of records from a single FASTQ file and seed the RNG
$ fq subsample --probability --seed 13 --r1-dst r1.50pct.fastq r1.fastq

# Sample ~25% of records from paired FASTQ files
$ fq subsample --probability 0.25 --r1-dst r1.25pct.fastq --r2-dst r2.25pct.fastq r1.fastq r2.fastq

# Sample ~10% of records from a gzipped FASTQ file and compress output
$ fq subsample --probability 0.1 --r1-dst r1.10pct.fastq.gz r1.fastq.gz

# Sample exactly 10000 records from a single FASTQ file
$ fq subsample --record-count 10000 -r1-dst r1.10k.fastq r1.fastq
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].