All Projects → lh3 → Seqtk

lh3 / Seqtk

Licence: mit
Toolkit for processing sequences in FASTA/Q formats

Programming Languages

c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to Seqtk

Wdl
Workflow Description Language - Specification and Implementations
Stars: ✭ 438 (-45.18%)
Mutual labels:  bioinformatics
Htslib
C library for high-throughput sequencing data formats
Stars: ✭ 529 (-33.79%)
Mutual labels:  bioinformatics
Nucleus
Python and C++ code for reading and writing genomics data.
Stars: ✭ 657 (-17.77%)
Mutual labels:  bioinformatics
Vsearch
Versatile open-source tool for microbiome analysis
Stars: ✭ 444 (-44.43%)
Mutual labels:  bioinformatics
Biostar Central
Biostar Q&A
Stars: ✭ 488 (-38.92%)
Mutual labels:  bioinformatics
Csvtk
A cross-platform, efficient and practical CSV/TSV toolkit in Golang
Stars: ✭ 566 (-29.16%)
Mutual labels:  bioinformatics
Biojava
📖🔬☕️ BioJava is an open-source project dedicated to providing a Java library for processing biological data.
Stars: ✭ 434 (-45.68%)
Mutual labels:  bioinformatics
Hail
Scalable genomic data analysis.
Stars: ✭ 706 (-11.64%)
Mutual labels:  bioinformatics
Ncbi Genome Download
Scripts to download genomes from the NCBI FTP servers
Stars: ✭ 494 (-38.17%)
Mutual labels:  bioinformatics
Khmer
In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
Stars: ✭ 640 (-19.9%)
Mutual labels:  bioinformatics
Deeptools
Tools to process and analyze deep sequencing data.
Stars: ✭ 448 (-43.93%)
Mutual labels:  bioinformatics
Bioawk
BWK awk modified for biological data
Stars: ✭ 462 (-42.18%)
Mutual labels:  bioinformatics
Getting Started With Genomics Tools And Resources
Unix, R and python tools for genomics and data science
Stars: ✭ 587 (-26.53%)
Mutual labels:  bioinformatics
Mmseqs2
MMseqs2: ultra fast and sensitive search and clustering suite
Stars: ✭ 441 (-44.81%)
Mutual labels:  bioinformatics
Cromwell
Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
Stars: ✭ 655 (-18.02%)
Mutual labels:  bioinformatics
Circosjs
d3 library to build circular graphs
Stars: ✭ 436 (-45.43%)
Mutual labels:  bioinformatics
Cs Video Courses
List of Computer Science courses with video lectures.
Stars: ✭ 27,209 (+3305.38%)
Mutual labels:  bioinformatics
Multiqc
Aggregate results from bioinformatics analyses across many samples into a single report.
Stars: ✭ 708 (-11.39%)
Mutual labels:  bioinformatics
React Plotly.js
A plotly.js React component from Plotly 📈
Stars: ✭ 701 (-12.27%)
Mutual labels:  bioinformatics
Seqkit
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang
Stars: ✭ 607 (-24.03%)
Mutual labels:  bioinformatics

Introduction

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip. To install seqtk,

git clone https://github.com/lh3/seqtk.git;
cd seqtk; make

The only library dependency is zlib.

Seqtk Examples

  • Convert FASTQ to FASTA:

      seqtk seq -a in.fq.gz > out.fa
    
  • Convert ILLUMINA 1.3+ FASTQ to FASTA and mask bases with quality lower than 20 to lowercases (the 1st command line) or to N (the 2nd):

      seqtk seq -aQ64 -q20 in.fq > out.fa
      seqtk seq -aQ64 -q20 -n N in.fq > out.fa
    
  • Fold long FASTA/Q lines and remove FASTA/Q comments:

      seqtk seq -Cl60 in.fa > out.fa
    
  • Convert multi-line FASTQ to 4-line FASTQ:

      seqtk seq -l0 in.fq > out.fq
    
  • Reverse complement FASTA/Q:

      seqtk seq -r in.fq > out.fq
    
  • Extract sequences with names in file name.lst, one sequence name per line:

      seqtk subseq in.fq name.lst > out.fq
    
  • Extract sequences in regions contained in file reg.bed:

      seqtk subseq in.fa reg.bed > out.fa
    
  • Mask regions in reg.bed to lowercases:

      seqtk seq -M reg.bed in.fa > out.fa
    
  • Subsample 10000 read pairs from two large paired FASTQ files (remember to use the same random seed to keep pairing):

      seqtk sample -s100 read1.fq 10000 > sub1.fq
      seqtk sample -s100 read2.fq 10000 > sub2.fq
    
  • Trim low-quality bases from both ends using the Phred algorithm:

      seqtk trimfq in.fq > out.fq
    
  • Trim 5bp from the left end of each read and 10bp from the right end:

      seqtk trimfq -b 5 -e 10 in.fa > out.fa
    
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].