Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → lh3 → Seqtk

lh3 / Seqtk

Licence: mit

Toolkit for processing sequences in FASTA/Q formats

Programming Languages

50402 projects - #5 most used programming language

Labels

bioinformatics

Projects that are alternatives of or similar to Seqtk

Wdl

Workflow Description Language - Specification and Implementations

Stars: ✭ 438 (-45.18%)

Mutual labels: bioinformatics

Htslib

C library for high-throughput sequencing data formats

Stars: ✭ 529 (-33.79%)

Mutual labels: bioinformatics

Nucleus

Python and C++ code for reading and writing genomics data.

Stars: ✭ 657 (-17.77%)

Mutual labels: bioinformatics

Vsearch

Versatile open-source tool for microbiome analysis

Stars: ✭ 444 (-44.43%)

Mutual labels: bioinformatics

Biostar Central

Biostar Q&A

Stars: ✭ 488 (-38.92%)

Mutual labels: bioinformatics

Csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang

Stars: ✭ 566 (-29.16%)

Mutual labels: bioinformatics

Biojava

📖🔬☕️ BioJava is an open-source project dedicated to providing a Java library for processing biological data.

Stars: ✭ 434 (-45.68%)

Mutual labels: bioinformatics

Hail

Scalable genomic data analysis.

Stars: ✭ 706 (-11.64%)

Mutual labels: bioinformatics

Ncbi Genome Download

Scripts to download genomes from the NCBI FTP servers

Stars: ✭ 494 (-38.17%)

Mutual labels: bioinformatics

Khmer

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more

Stars: ✭ 640 (-19.9%)

Mutual labels: bioinformatics

Deeptools

Tools to process and analyze deep sequencing data.

Stars: ✭ 448 (-43.93%)

Mutual labels: bioinformatics

Bioawk

BWK awk modified for biological data

Stars: ✭ 462 (-42.18%)

Mutual labels: bioinformatics

Getting Started With Genomics Tools And Resources

Unix, R and python tools for genomics and data science

Stars: ✭ 587 (-26.53%)

Mutual labels: bioinformatics

Mmseqs2

MMseqs2: ultra fast and sensitive search and clustering suite

Stars: ✭ 441 (-44.81%)

Mutual labels: bioinformatics

Cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

Stars: ✭ 655 (-18.02%)

Mutual labels: bioinformatics

Circosjs

d3 library to build circular graphs

Stars: ✭ 436 (-45.43%)

Mutual labels: bioinformatics

Cs Video Courses

List of Computer Science courses with video lectures.

Stars: ✭ 27,209 (+3305.38%)

Mutual labels: bioinformatics

Multiqc

Aggregate results from bioinformatics analyses across many samples into a single report.

Stars: ✭ 708 (-11.39%)

Mutual labels: bioinformatics

React Plotly.js

A plotly.js React component from Plotly 📈

Stars: ✭ 701 (-12.27%)

Mutual labels: bioinformatics

Seqkit

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang

Stars: ✭ 607 (-24.03%)

Mutual labels: bioinformatics

View All Similar Projects ➔

Introduction

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip. To install seqtk,

git clone https://github.com/lh3/seqtk.git;
cd seqtk; make

The only library dependency is zlib.

Seqtk Examples

Convert FASTQ to FASTA:
```
  seqtk seq -a in.fq.gz > out.fa
```
Convert ILLUMINA 1.3+ FASTQ to FASTA and mask bases with quality lower than 20 to lowercases (the 1st command line) or to N (the 2nd):
```
  seqtk seq -aQ64 -q20 in.fq > out.fa
  seqtk seq -aQ64 -q20 -n N in.fq > out.fa
```
Fold long FASTA/Q lines and remove FASTA/Q comments:
```
  seqtk seq -Cl60 in.fa > out.fa
```
Convert multi-line FASTQ to 4-line FASTQ:
```
  seqtk seq -l0 in.fq > out.fq
```
Reverse complement FASTA/Q:
```
  seqtk seq -r in.fq > out.fq
```
Extract sequences with names in file name.lst, one sequence name per line:
```
  seqtk subseq in.fq name.lst > out.fq
```
Extract sequences in regions contained in file reg.bed:
```
  seqtk subseq in.fa reg.bed > out.fa
```
Mask regions in reg.bed to lowercases:
```
  seqtk seq -M reg.bed in.fa > out.fa
```
Subsample 10000 read pairs from two large paired FASTQ files (remember to use the same random seed to keep pairing):
```
  seqtk sample -s100 read1.fq 10000 > sub1.fq
  seqtk sample -s100 read2.fq 10000 > sub2.fq
```
Trim low-quality bases from both ends using the Phred algorithm:
```
  seqtk trimfq in.fq > out.fq
```
Trim 5bp from the left end of each read and 10bp from the right end:
```
  seqtk trimfq -b 5 -e 10 in.fa > out.fa
```

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 799

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (43) 🔗