fq

fq is a library to generate and validate FASTQ file pairs.

Install

There are different methods to install fq.

Releases

Precompiled binaries are built for modern Linux distributions (x86_64-unknown-linux-gnu), macOS (x86_64-apple-darwin), and Windows (x86_64-pc-windows-msvc). The Linux binaries require glibc 2.18+ (CentOS/RHEL 8+, Debian 8+, Ubuntu 14.04+, etc.).

Conda

fq is available via Bioconda.

$ conda install fq=0.9.1

Manual

Clone the repository and use Cargo to install fq.

$ git clone --depth 1 --branch v0.9.1 https://github.com/stjude-rust-labs/fq.git
$ cd fq
$ cargo install --locked --path .

Container image

Container images are managed by Bioconda and available through Quay.io, e.g., using Docker:

$ docker image pull quay.io/biocontainers/fq:<tag>

See the repository tags for the available tags.

Alternatively, build the development container image:

$ git clone --depth 1 --branch v0.9.1 https://github.com/stjude-rust-labs/fq.git
$ cd fq
$ docker image build --tag fq:0.9.1 .

Usage

fq provides subcommands for filtering, generating, subsampling, and validating FASTQ files.

filter

fq filter takes an allowlist of record names and filters a given FASTQ file. The result includes only the records in the allowlist.

Usage

fq-filter
Filters a FASTQ from an allowlist of names

USAGE:
    fq filter --names <path> <src>

ARGS:
    <src>    Source FASTQ

OPTIONS:
    -h, --help            Print help information
        --names <path>    Allowlist of record names
    -V, --version         Print version information

Examples

# Filters an input FASTQ using the given allowlist.
$ fq filter --names allowlist.txt in.fastq

generate

fq generate is a FASTQ file pair generator. It creates two reads, formatting names as described by Illumina.

While generate creates "valid" FASTQ reads, the content of the files are completely random. The sequences do not align to any genome.

Usage

fq-generate
Generates a random FASTQ file pair

USAGE:
    fq generate [OPTIONS] <r1-dst> <r2-dst>

ARGS:
    <r1-dst>    Read 1 destination. Output will be gzipped if ends in `.gz`.
    <r2-dst>    Read 2 destination. Output will be gzipped if ends in `.gz`.

OPTIONS:
    -h, --help                   Print help information
    -n, --record-count <u64>     Number of records to generate [default: 10000]
        --read-length <usize>    Number of bases in the sequence [default: 101]
    -s, --seed <u64>             Seed to use for the random number generator
    -V, --version                Print version information

Examples

# Generates the default number of records, written to uncompressed files.
$ fq generate /tmp/r1.fastq /tmp/r2.fastq

# Generates FASTQ paired reads with 32 records, written to gzipped outputs.
$ fq generate --record-count 32 /tmp/r1.fastq.gz /tmp/r2.fastq.gz

lint

fq lint is a FASTQ file pair validator.

Usage

fq-lint
Validates a FASTQ file pair

USAGE:
    fq lint [OPTIONS] <r1-src> [--] [r2-src]

ARGS:
    <r1-src>    Read 1 source. Accepts both raw and gzipped FASTQ inputs.
    <r2-src>    Read 2 source. Accepts both raw and gzipped FASTQ inputs.

OPTIONS:
        --disable-validator <str>
            Disable validators by code. Use multiple times to disable more than one.

    -h, --help
            Print help information

        --lint-mode <str>
            Panic on first error or log all errors [default: panic] [possible values: panic, log]

        --paired-read-validation-level <str>
            Only use paired read validators up to a given level [default: high] [possible values:
            low, medium, high]

        --single-read-validation-level <str>
            Only use single read validators up to a given level [default: high] [possible values:
            low, medium, high]

    -V, --version
            Print version information

Validators

validate includes a set of validators that run on single or paired records. By default, records are validated with all rules, but validators can be disabled using --disable-valdiator CODE, where CODE is one of validators listed below.

Single

Code	Level	Name	Validation
S001	low	PlusLine	Plus line starts with a "+".
S002	medium	Alphabet	All characters in sequence line are one of "ACGTN", case-insensitive.
S003	high	Name	Name line starts with an "@".
S004	low	Complete	All four record lines (name, sequence, plus line, and quality) are present.
S005	high	ConsistentSeqQual	Sequence and quality lengths are the same.
S006	medium	QualityString	All characters in quality line are between "!" and "~" (ordinal values).
S007	high	DuplicateName	All record names are unique.

Paired

Code	Level	Name	Validation
P001	medium	Names	Each paired read name is the same, excluding interleave.

Examples

# Validate both reads using all validators. Exits cleanly (0) if no validation
# errors occur.
$ fq lint r1.fastq r2.fastq

# Log errors instead of quitting on first error.
$ fq lint --lint-mode log r1.fastq r2.fastq

# Disable validators S004 and S007.
$ fq lint --disable-validator S004 --disable-validator S007 r1.fastq r2.fastq

subsample

fq subsample outputs a subset of records from single or paired FASTQ files.

When using a probability (-p, --probability), each file is read through once, and a subset of records is selected based on that chance. Given the randomness used when sampling a uniform distribution, the output record count will not be exact but (statistically) close.

When using a record count (-n, --record-count), the first input is read twice, but it provides an exact number of records to be selected.

A seed (-s, --seed) can be provided to influence the results, e.g., for a deterministic subset of records.

For paired input, the sampling is applied to each pair.

Usage

fq-subsample
Outputs a subset of records

USAGE:
    fq subsample [OPTIONS] --probability <f64> --record-count <u64> --r1-dst <path> <r1-src> [r2-src]

ARGS:
    <r1-src>    Read 1 source. Accepts both raw and gzipped FASTQ inputs.
    <r2-src>    Read 2 source. Accepts both raw and gzipped FASTQ inputs.

OPTIONS:
    -h, --help                  Print help information
    -n, --record-count <u64>    The exact number of records to keep. Cannot be used with
                                `probability`.
    -p, --probability <f64>     The probability a record is kept, as a percentage [0, 1]. Cannot be
                                used with `record-count`.
        --r1-dst <path>         Read 1 destination. Output will be gzipped if ends in `.gz`.
        --r2-dst <path>         Read 2 destination. Output will be gzipped if ends in `.gz`.
    -s, --seed <u64>            Seed to use for the random number generator
    -V, --version               Print version information

Examples

# Sample ~50% of records from a single FASTQ file
$ fq subsample --probability 0.5 --r1-dst r1.50pct.fastq r1.fastq

# Sample ~50% of records from a single FASTQ file and seed the RNG
$ fq subsample --probability --seed 13 --r1-dst r1.50pct.fastq r1.fastq

# Sample ~25% of records from paired FASTQ files
$ fq subsample --probability 0.25 --r1-dst r1.25pct.fastq --r2-dst r2.25pct.fastq r1.fastq r2.fastq

# Sample ~10% of records from a gzipped FASTQ file and compress output
$ fq subsample --probability 0.1 --r1-dst r1.10pct.fastq.gz r1.fastq.gz

# Sample exactly 10000 records from a single FASTQ file
$ fq subsample --record-count 10000 -r1-dst r1.10k.fastq r1.fastq

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

stjude-rust-labs / fq

Programming Languages

Labels

Projects that are alternatives of or similar to fq

fq

Install

Releases

Conda

Manual

Container image

Usage

filter

Usage

Examples

generate

Usage

Examples

lint

Usage

Validators

Single

Paired

Examples

subsample

Usage

Examples