All Projects → shenwei356 → strobemers

shenwei356 / strobemers

Licence: MIT license
A Go implementation of the strobemers (https://github.com/ksahlin/strobemers)

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to strobemers

StrobeAlign
Aligns short reads using dynamic seed size with strobemers
Stars: ✭ 49 (+276.92%)
Mutual labels:  strobemers
ultra
uLTRA is a long-read splice aligner with high accuracy from using a guiding annotation
Stars: ✭ 47 (+261.54%)
Mutual labels:  strobemers

Strobemers in Go

GoDoc Go Report Card

Introduction

This is a Go implementation of the strobemers (minstrobes and randstrobes), with some differences.

The implementation of Randstrobes has a not-bad performance (2-3X slower) compared to regular k-mer, while it's 10-20X slower than ntHash. Besides, Randstrobes is only slightly slower than MinStrobes (see benchmark).

Attention

The current implementation only computes strobemers of the positive strand, because the strobes are asymmetrical and the location matters.

Installation

go get github.com/shenwei356/strobemers

Quick Start

We followed the code style of ntHash.

n := 2
l := 3
w_min := 3
w_max := 5
rs, err := strobemers.NewRandStrobes(seq, n, l, w_min, w_max)
checkError(err)

var hash uint64
var ok bool
var i int  // 0-based index
var positions []int // 0-based indexes of all strobes

rs.SetWindowShrink(true)
for {
    hash, ok = rs.Next()
    if !ok {
        break
    }

    i = rs.Index()
    positions = rs.Indexes()
}

Differences

Here are some differences compared to the original implementation, see discussion: #1, #2.

item orginal this comment
window range w_min < w_max w_min <= w_max allow a fixed position
shrinking window all w_min and w_max optional shrinking last w_max see figures below
number of strobemers len(seq)-n*l+1 len(seq)-n*l+1-(n-1)*l window shrinked
number of strobemers len(seq)-n*l+1-(n-1)*(l+w_min-1) window not shrinked
choice of min hash (h(m)+h(mj))%q (h(m)+h(mj))&q & is faster than %
final hash value (n=2) h(m1)-h(m2) h(m1)/2+h(m2)/3 keep asymmetry and avoid uint64 overflow
final hash value (n=3) h(m1)-h(m2)+2*h(m3) h(m1)/3+h(m2)/4+h(m3)/5 ~

Benchmark

method time relative_time
ntHashKmers(30) 8590 1
Kmers(30) 55579 6
MinStrobes(2,15,20,30) 104520 12
MinStrobes(3,10,20,30) 111662 13
RandStrobes(2,15,20,30) 93436 11
RandStrobes(3,10,20,30) 152461 18
$ go test . -bench=Benchmark* -benchmem \
    | grep Bench \
    | perl -pe 's/\s\s+/\t/g' \
    | csvtk cut -Ht -f 1,3-5 \
    | csvtk add-header -t -n test,time,memory,allocs \
    | csvtk pretty -t -r

                                 test           time       memory        allocs
-------------------------------------   ------------   ----------   -----------
           BenchmarkNTHash/1.00_KB-16     8590 ns/op      48 B/op   1 allocs/op
            BenchmarkKmers/1.00_KB-16    55579 ns/op      32 B/op   1 allocs/op
 BenchmarkMinStrobesOrder2/1.00_KB-16   104520 ns/op   25064 B/op   7 allocs/op
 BenchmarkMinStrobesOrder3/1.00_KB-16   111662 ns/op   25064 B/op   7 allocs/op
BenchmarkRandStrobesOrder2/1.00_KB-16    93436 ns/op    8432 B/op   3 allocs/op
BenchmarkRandStrobesOrder3/1.00_KB-16   152461 ns/op    8432 B/op   3 allocs/op

Similar Projects

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].