Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → seq-lang → Seq

seq-lang / Seq

Licence: apache-2.0

A high-performance, Pythonic language for bioinformatics

Programming Languages

139335 projects - #7 most used programming language

Labels

compiler bioinformatics genomics

Projects that are alternatives of or similar to Seq

netSmooth: A Network smoothing based method for Single Cell RNA-seq imputation

Stars: ✭ 23 (-91.25%)

Mutual labels: bioinformatics, genomics

[DEPRECATED] Bioinformatics and Computational Biology Infrastructure for Julia

Stars: ✭ 257 (-2.28%)

Mutual labels: bioinformatics, genomics

awesome-genetics

A curated list of awesome bioinformatics software.

Stars: ✭ 60 (-77.19%)

Mutual labels: bioinformatics, genomics

varsome-api-client-python

Example client programs for Saphetor's VarSome annotation API

Stars: ✭ 21 (-92.02%)

Mutual labels: bioinformatics, genomics

A React web application to query and share any PostgreSQL database.

Stars: ✭ 260 (-1.14%)

Mutual labels: bioinformatics, genomics

Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data.

Stars: ✭ 64 (-75.67%)

Mutual labels: bioinformatics, genomics

A fast 23andMe genome text file parser, now superseded by arv

Stars: ✭ 64 (-75.67%)

Mutual labels: bioinformatics, genomics

full spectrum bioinformatics

An open-access bioinformatics text

Stars: ✭ 26 (-90.11%)

Mutual labels: bioinformatics, genomics

De novo assembly based variant calling pipeline for Illumina short reads

Stars: ✭ 98 (-62.74%)

Mutual labels: bioinformatics, genomics

BACNET is a Java based platform to develop website for multi-omics analysis

Stars: ✭ 12 (-95.44%)

Mutual labels: bioinformatics, genomics

annotate a VCF with other VCFs/BEDs/tabixed files

Stars: ✭ 259 (-1.52%)

Mutual labels: bioinformatics, genomics

Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI

Stars: ✭ 27 (-89.73%)

Mutual labels: bioinformatics, genomics

Assembling the cause of phenotypes and genotypes from NGS data

Stars: ✭ 27 (-89.73%)

Mutual labels: bioinformatics, genomics

Fast alignment and preprocessing of chromatin profiles

Stars: ✭ 93 (-64.64%)

Mutual labels: bioinformatics, genomics

GenomicDataCommons

Provide R access to the NCI Genomic Data Commons portal.

Stars: ✭ 64 (-75.67%)

Mutual labels: bioinformatics, genomics

Predict plasmids from uncorrected long read data

Stars: ✭ 27 (-89.73%)

Mutual labels: bioinformatics, genomics

Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.

Stars: ✭ 52 (-80.23%)

Mutual labels: bioinformatics, genomics

Bystro genetic analysis (annotation, filtering, statistics)

Stars: ✭ 31 (-88.21%)

Mutual labels: bioinformatics, genomics

Earl Grey: A fully automated TE curation and annotation pipeline

Stars: ✭ 25 (-90.49%)

Mutual labels: bioinformatics, genomics

GenomeAnalysisModule

Welcome to the website and github repository for the Genome Analysis Module. This website will guide the learning experience for trainees in the UBC MSc Genetic Counselling Training Program, as they embark on a journey to learn about analyzing genomes.

Stars: ✭ 19 (-92.78%)

Mutual labels: bioinformatics, genomics

View All Similar Projects ➔

Seq — a language for bioinformatics

Introduction

A strongly-typed and statically-compiled high-performance Pythonic language!

Seq is a programming language for computational genomics and bioinformatics. With a Python-compatible syntax and a host of domain-specific features and optimizations, Seq makes writing high-performance genomics software as easy as writing Python code, and achieves performance comparable to (and in many cases better than) C/C++.

Think of Seq as a strongly-typed and statically-compiled Python: all the bells and whistles of Python, boosted with a strong type system, without any performance overhead.

Seq is able to outperform Python code by up to 160x. Seq can further beat equivalent C/C++ code by up to 2x without any manual interventions, and also natively supports parallelism out of the box. Implementation details and benchmarks are discussed in our paper.

Learn more by following the tutorial or from the cookbook.

Examples

Seq is a Python-compatible language, and the vast majority of Python programs should work without any modifications:

def check_prime(n):
    if n > 1:
        for i in range(2, n):
            if n % i == 0:
                return False
        return True
    else:
        return False

n = 1009
print n, 'is', 'a' if check_prime(n) else 'not a', 'prime'

Here is an example showcasing Seq's bioinformatics features:

s = s'ACGTACGT'    # sequence literal
print s[2:5]       # subsequence
print ~s           # reverse complement
kmer = Kmer[8](s)  # convert to k-mer
type K2 = Kmer[2]  # type definition

# iterate over length-3 subsequences
# with step 2
for sub in s.split(3, step=2):
    print sub[-1]  # last base

    # iterate over 2-mers with step 1
    for kmer in sub.kmers[K2](step=1):
        print ~kmer  # '~' also works on k-mers

Seq provides native sequence and k-mer types, e.g. a 8-mer is represented by Kmer[8] as above.

Here is a more complex example that counts occurrences of subsequences from a FASTQ file (argv[2]) in sequences obtained from a FASTA file (argv[1]) using an FM-index:

from sys import argv
from bio.fmindex import FMIndex
fmi = FMIndex(argv[1])
k, step, n = 20, 20, 0

def add(count: int):
    global n
    n += count

@prefetch
def search(s: seq, fmi: FMIndex):
    intv = fmi.interval(s[-1])
    s = s[:-1]  # trim last base
    while s and intv:
        # backwards-extend intv
        intv = fmi[intv, s[-1]]
        s = s[:-1]  # trim last
    # return count of occurrences
    return len(intv)

FASTQ(argv[2]) |> seqs |> split(k, step) |> search(fmi) |> add
print 'total:', n

The @prefetch annotation tells the compiler to perform a coroutine-based pipeline transformation to make the FM-index queries faster, by overlapping the cache miss latency from one query with other useful work. In practice, the single @prefetch line can provide a 2x performance improvement.

Install

Pre-built binaries

Pre-built binaries for Linux and macOS on x86_64 are available alongside each release. We also have a script for downloading and installing pre-built versions:

/bin/bash -c "$(curl -fsSL https://seq-lang.org/install.sh)"

Build from source

See Building from Source.

Documentation

Please check docs.seq-lang.org for in-depth documentation.

Citing Seq

If you use Seq in your research, please cite:

Ariya Shajii, Ibrahim Numanagić, Riyadh Baghdadi, Bonnie Berger, and Saman Amarasinghe. 2019. Seq: a high-performance language for bioinformatics. Proc. ACM Program. Lang. 3, OOPSLA, Article 125 (October 2019), 29 pages. DOI: https://doi.org/10.1145/3360551

BibTeX:

@article{Shajii:2019:SHL:3366395.3360551,
 author = {Shajii, Ariya and Numanagi\'{c}, Ibrahim and Baghdadi, Riyadh and Berger, Bonnie and Amarasinghe, Saman},
 title = {Seq: A High-performance Language for Bioinformatics},
 journal = {Proc. ACM Program. Lang.},
 issue_date = {October 2019},
 volume = {3},
 number = {OOPSLA},
 month = oct,
 year = {2019},
 issn = {2475-1421},
 pages = {125:1--125:29},
 articleno = {125},
 numpages = {29},
 url = {http://doi.acm.org/10.1145/3360551},
 doi = {10.1145/3360551},
 acmid = {3360551},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {Python, bioinformatics, computational biology, domain-specific language, optimization, programming language},
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 263

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (21) 🔗