Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → lh3 → Biofast

lh3 / Biofast

Benchmarking programming languages/implementations for common tasks in Bioinformatics

Programming Languages

50402 projects - #5 most used programming language

Labels

bioinformatics

Projects that are alternatives of or similar to Biofast

Genomicsqlite

Genomics Extension for SQLite

Stars: ✭ 90 (-19.64%)

Mutual labels: bioinformatics

Bionitio

Demonstrating best practices for bioinformatics command line tools

Stars: ✭ 97 (-13.39%)

Mutual labels: bioinformatics

Sortmerna

SortMeRNA: next-generation sequence filtering and alignment tool

Stars: ✭ 108 (-3.57%)

Mutual labels: bioinformatics

Riddle

Race and ethnicity Imputation from Disease history with Deep LEarning

Stars: ✭ 91 (-18.75%)

Mutual labels: bioinformatics

Dnachisel

✏️ A versatile DNA sequence optimizer

Stars: ✭ 95 (-15.18%)

Mutual labels: bioinformatics

Pymzml

pymzML - an interface between Python and mzML Mass spectrometry Files

Stars: ✭ 100 (-10.71%)

Mutual labels: bioinformatics

Molgenis

MOLGENIS - for scientific data: management, exploration, integration and analysis.

Stars: ✭ 88 (-21.43%)

Mutual labels: bioinformatics

Cgranges

A C/C++ library for fast interval overlap queries (with a "bedtools coverage" example)

Stars: ✭ 111 (-0.89%)

Mutual labels: bioinformatics

Ariba

Antimicrobial Resistance Identification By Assembly

Stars: ✭ 96 (-14.29%)

Mutual labels: bioinformatics

Indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.

Stars: ✭ 105 (-6.25%)

Mutual labels: bioinformatics

Fastqt

FastQC port to Qt5: A quality control tool for high throughput sequence data.

Stars: ✭ 92 (-17.86%)

Mutual labels: bioinformatics

Nextflow

A DSL for data-driven computational pipelines

Stars: ✭ 1,337 (+1093.75%)

Mutual labels: bioinformatics

Genomics

A collection of scripts and notes related to genomics and bioinformatics

Stars: ✭ 101 (-9.82%)

Mutual labels: bioinformatics

Bio

Bioinformatics library for .NET

Stars: ✭ 90 (-19.64%)

Mutual labels: bioinformatics

Taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit

Stars: ✭ 109 (-2.68%)

Mutual labels: bioinformatics

Swarm

A robust and fast clustering method for amplicon-based studies

Stars: ✭ 88 (-21.43%)

Mutual labels: bioinformatics

Smudgeplot

Inference of ploidy and heterozygosity structure using whole genome sequencing data

Stars: ✭ 98 (-12.5%)

Mutual labels: bioinformatics

Pyani

Python module for average nucleotide identity analyses

Stars: ✭ 111 (-0.89%)

Mutual labels: bioinformatics

Pegasus

Pegasus Workflow Management System - Automate, recover, and debug scientific computations.

Stars: ✭ 110 (-1.79%)

Mutual labels: bioinformatics

Bedtk

A simple toolset for BED files (warning: CLI may change before bedtk becomes stable)

Stars: ✭ 103 (-8.04%)

Mutual labels: bioinformatics

View All Similar Projects ➔

Introduction

Biofast is a small benchmark for evaluating the performance of programming languages and implementations on a few common tasks in the field of Bioinformatics. It currently includes two benchmarks: interval query and FASTQ parsing. Please see also the companion blog post.

Results

Setup

We ran the test on a CentOS 7 server with two EPYC 7301 CPUs and 1TB memory. The system comes with gcc-4.8.5, python-3.7.6, nim-1.2.0, julia-1.4.1, go-1.14.3, luajit-322db02 and k8-0.2.5. Relatively small libraries are included in the lib directory directory.

We tried to avoid other active processes when test programs were running. Timing in this page was obtained with hyperfine, which reports CPU time averaged in at least ten rounds. Peak memory was often measured only once as hyperfine doesn't report memory usage.

Full results can be found in the bedcov and fqcnt directories, respectively. This README only shows one implementation per language. We exclude those binding to C libraries and try to select the one implementing a similar algorithm to the C version.

Computing the depth and breadth of coverage from BED files

In this benchmark, we load one BED file into memory. We stream another BED file and compute coverage of each interval using the cgranges algorithm (see the C++ header for algorithm details). The output all programs should be identical "bedtools coverage". In the table below, "t" stands for CPU time in seconds and "M" for peak memory in mega-bytes. Subscripts "g2r" and "r2g" correspond to the following two command lines, respectively:

bedcov ex-rna.bed ex-anno.bed  # g2r
bedcov ex-anno.bed ex-rna.bed  # r2g

Both input BED files can be found in biofast-data-v1.tar.gz from the download page.

Program	Language	t_g2r (s)	M_g2r (Mb)	t_r2g (s)	M_r2g (Mb)
bedcov_c1_cgr.c	C	5.2	138.4	10.7	19.1
bedcov_cr1_klib.cr	Crystal	8.8	319.6	14.8	40.7
bedcov_nim1_klib.nim	Nim	16.6	248.4	26.0	34.1
bedcov_jl1_klib.jl	Julia	25.9	428.1	63.0	257.0
bedcov_go1.go	Go	34.0	318.9	21.8	47.3
bedcov_js1_cgr.js	Javascript	76.4	2219.9	80.0	316.8
bedcov_lua1_cgr.lua	LuaJIT	174.7	2668.0	218.9	364.6
bedcov_py1_cgr.py	PyPy	17332.9	1594.3	5481.2	256.8
bedcov_py1_cgr.py	Python	>33770.4	2317.6	>20722.0	313.7

For the full table and technical notes, see the bedcov directory.

FASTQ parsing

In this benchmark, we parse a 4-line FASTQ file consisting of 5,682,010 records and report the number of records and the total length of sequences and quality. The input file is M_abscessus_HiSeq.fq in biofast-data-v1.tar.gz from the download page. In the table below, "t_gzip" gives the CPU time in seconds for gzip'd input and "t_plain" gives the time for raw input without compression.

Program	Language	t_gzip (s)	t_plain (s)	Comments
fqcnt_rs2_needletail.rs	Rust	9.3	0.8	needletail; fasta/4-line fastq
fqcnt_c1_kseq.c	C	9.7	1.4	multi-line fasta/fastq
fqcnt_cr1_klib.cr	Crystal	9.7	1.5	kseq.h port
fqcnt_nim1_klib.nim	Nim	10.5	2.3	kseq.h port
fqcnt_jl1_klib.jl	Julia	11.2	2.9	kseq.h port
fqcnt_js1_k8.js	Javascript	17.5	9.4	kseq.h port
fqcnt_go1.go	Go	19.1	2.8	4-line only
fqcnt_lua1_klib.lua	LuaJIT	28.6	27.2	partial kseq.h port
fqcnt_py2_rfq.py	PyPy	28.9	14.6	partial kseq.h port
fqcnt_py2_rfq.py	Python	42.7	19.1	partial kseq.h port

For the full table and technical notes, see the fqcnt directory.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 112

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗