Strobemers in Go

Introduction

This is a Go implementation of the strobemers (minstrobes and randstrobes), with some differences.

The implementation of Randstrobes has a not-bad performance (2-3X slower) compared to regular k-mer, while it's 10-20X slower than ntHash. Besides, Randstrobes is only slightly slower than MinStrobes (see benchmark).

Attention

The current implementation only computes strobemers of the positive strand, because the strobes are asymmetrical and the location matters.

Installation

go get github.com/shenwei356/strobemers

Quick Start

We followed the code style of ntHash.

n := 2
l := 3
w_min := 3
w_max := 5
rs, err := strobemers.NewRandStrobes(seq, n, l, w_min, w_max)
checkError(err)

var hash uint64
var ok bool
var i int  // 0-based index
var positions []int // 0-based indexes of all strobes

rs.SetWindowShrink(true)
for {
    hash, ok = rs.Next()
    if !ok {
        break
    }

    i = rs.Index()
    positions = rs.Indexes()
}

Differences

Here are some differences compared to the original implementation, see discussion: #1, #2.

item	orginal	this	comment
window range	`w_min < w_max`	`w_min <= w_max`	allow a fixed position
shrinking window	all `w_min` and `w_max`	optional shrinking last `w_max`	see figures below
number of strobemers	`len(seq)-n*l+1`	`len(seq)-nl+1-(n-1)l`	window shrinked
number of strobemers		`len(seq)-nl+1-(n-1)(l+w_min-1)`	window not shrinked
choice of min hash	`(h(m)+h(mj))%q`	`(h(m)+h(mj))&q`	`&` is faster than `%`
final hash value (n=2)	`h(m1)-h(m2)`	`h(m1)/2+h(m2)/3`	keep asymmetry and avoid `uint64` overflow
final hash value (n=3)	`h(m1)-h(m2)+2*h(m3)`	`h(m1)/3+h(m2)/4+h(m3)/5`	~

Benchmark

method	time	relative_time
ntHashKmers(30)	8590	1
Kmers(30)	55579	6
MinStrobes(2,15,20,30)	104520	12
MinStrobes(3,10,20,30)	111662	13
RandStrobes(2,15,20,30)	93436	11
RandStrobes(3,10,20,30)	152461	18

$ go test . -bench=Benchmark* -benchmem \
    | grep Bench \
    | perl -pe 's/\s\s+/\t/g' \
    | csvtk cut -Ht -f 1,3-5 \
    | csvtk add-header -t -n test,time,memory,allocs \
    | csvtk pretty -t -r

                                 test           time       memory        allocs
-------------------------------------   ------------   ----------   -----------
           BenchmarkNTHash/1.00_KB-16     8590 ns/op      48 B/op   1 allocs/op
            BenchmarkKmers/1.00_KB-16    55579 ns/op      32 B/op   1 allocs/op
 BenchmarkMinStrobesOrder2/1.00_KB-16   104520 ns/op   25064 B/op   7 allocs/op
 BenchmarkMinStrobesOrder3/1.00_KB-16   111662 ns/op   25064 B/op   7 allocs/op
BenchmarkRandStrobesOrder2/1.00_KB-16    93436 ns/op    8432 B/op   3 allocs/op
BenchmarkRandStrobesOrder3/1.00_KB-16   152461 ns/op    8432 B/op   3 allocs/op

Similar Projects

strobemer_cpptest

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

shenwei356 / strobemers

Programming Languages

Labels

Projects that are alternatives of or similar to strobemers