Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

biocore-ntnu / Ncls

Licence: bsd-3-clause

The Nested Containment List for Python. Basically a static interval-tree that is silly fast for both construction and lookups.

Programming Languages

python

139335 projects - #7 most used programming language

50402 projects - #5 most used programming language

Labels

numpy

Projects that are alternatives of or similar to Ncls

One Python Benchmark Per Day

An ongoing fun challenge where I'll try to post one Python benchmark per day.

Stars: ✭ 124 (-12.68%)

Mutual labels: numpy

Ds Ai Tech Notes

📖 [译] 数据科学和人工智能技术笔记

Stars: ✭ 131 (-7.75%)

Mutual labels: numpy

Facedetection

🌟 Human Face Detection based on AdaBoost

Stars: ✭ 137 (-3.52%)

Mutual labels: numpy

Data Science For Marketing Analytics

Achieve your marketing goals with the data analytics power of Python

Stars: ✭ 127 (-10.56%)

Mutual labels: numpy

Forpy

Forpy - use Python from Fortran

Stars: ✭ 129 (-9.15%)

Mutual labels: numpy

Jyni

Enables Jython to load native CPython extensions.

Stars: ✭ 131 (-7.75%)

Mutual labels: numpy

Prusacontrol

PrusaControl is an alternative user interface for Slic3r Prusa Edition

Stars: ✭ 123 (-13.38%)

Mutual labels: numpy

Python Cheat Sheet

Python Cheat Sheet NumPy, Matplotlib

Stars: ✭ 1,739 (+1124.65%)

Mutual labels: numpy

Root numpy

The interface between ROOT and NumPy

Stars: ✭ 130 (-8.45%)

Mutual labels: numpy

Veros

The versatile ocean simulator, in pure Python, powered by Bohrium.

Stars: ✭ 136 (-4.23%)

Mutual labels: numpy

Color Tracker

Color tracking with OpenCV

Stars: ✭ 128 (-9.86%)

Mutual labels: numpy

Tiny ml

numpy 实现的周志华《机器学习》书中的算法及其他一些传统机器学习算法

Stars: ✭ 129 (-9.15%)

Mutual labels: numpy

Machine Learning Projects

This repository consists of all my Machine Learning Projects.

Stars: ✭ 135 (-4.93%)

Mutual labels: numpy

Teaching Monolith

Data science teaching materials

Stars: ✭ 126 (-11.27%)

Mutual labels: numpy

Irwin

irwin - the protector of lichess from all chess players villainous

Stars: ✭ 138 (-2.82%)

Mutual labels: numpy

From Python To Numpy

An open-access book on numpy vectorization techniques, Nicolas P. Rougier, 2017

Stars: ✭ 1,728 (+1116.9%)

Mutual labels: numpy

Pyjson tricks

Extra features for Python's JSON: comments, order, numpy, pandas, datetimes, and many more! Simple but customizable.

Stars: ✭ 131 (-7.75%)

Mutual labels: numpy

Data Analysis

主要是爬虫与数据分析项目总结，外加建模与机器学习，模型的评估。

Stars: ✭ 142 (+0%)

Mutual labels: numpy

Nptdms

NumPy based Python module for reading TDMS files produced by LabView

Stars: ✭ 138 (-2.82%)

Mutual labels: numpy

Ml Cheatsheet

A constantly updated python machine learning cheatsheet

Stars: ✭ 136 (-4.23%)

Mutual labels: numpy

View All Similar Projects ➔

Nested containment list

The Nested Containment List is a datastructure for interval overlap queries, like the interval tree. It is usually an order of magnitude faster than the interval tree both for building and query lookups.

The implementation here is a revived version of the one used in the now defunct PyGr library, which died of bitrot. I have made it less memory-consuming and created wrapper functions which allows batch-querying the NCLS for further speed gains.

It was implemented to be the cornerstone of the PyRanges project, but I have made it available to the Python community as a stand-alone library. Enjoy.

Original Paper: https://academic.oup.com/bioinformatics/article/23/11/1386/199545 Cite: http://dx.doi.org/10.1093/bioinformatics/btz615

Cite

If you use this library in published research cite

http://dx.doi.org/10.1093/bioinformatics/btz615

Install

pip install ncls

Usage

from ncls import NCLS

import pandas as pd

starts = pd.Series(range(0, 5))
ends = starts + 100
ids = starts

subject_df = pd.DataFrame({"Start": starts, "End": ends}, index=ids)

print(subject_df)
#    Start  End
# 0      0  100
# 1      1  101
# 2      2  102
# 3      3  103
# 4      4  104

ncls = NCLS(starts.values, ends.values, ids.values)

# python API, slower
it = ncls.find_overlap(0, 2)
for i in it:
    print(i)
# (0, 100, 0)
# (1, 101, 1)

starts_query = pd.Series([1, 3])
ends_query = pd.Series([52, 14])
indexes_query = pd.Series([10000, 100])

query_df = pd.DataFrame({"Start": starts_query.values, "End": ends_query.values}, index=indexes_query.values)

query_df
#        Start  End
# 10000      1   52
# 100        3   14


# everything done in C/Cython; faster
l_idxs, r_idxs = ncls.all_overlaps_both(starts_query.values, ends_query.values, indexes_query.values)
l_idxs, r_idxs
# (array([10000, 10000, 10000, 10000, 10000,   100,   100,   100,   100,
#          100]), array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4]))

print(query_df.loc[l_idxs])
#        Start  End
# 10000      1   52
# 10000      1   52
# 10000      1   52
# 10000      1   52
# 10000      1   52
# 100        3   14
# 100        3   14
# 100        3   14
# 100        3   14
# 100        3   14
print(subject_df.loc[r_idxs])
#    Start  End
# 0      0  100
# 1      1  101
# 2      2  102
# 3      3  103
# 4      4  104
# 0      0  100
# 1      1  101
# 2      2  102
# 3      3  103
# 4      4  104

# return intervals in python (slow/mem-consuming)
intervals = ncls.intervals()
intervals
# [(0, 100, 0), (1, 101, 1), (2, 102, 2), (3, 103, 3), (4, 104, 4)]

There is also an experimental floating point version of the NCLS called FNCLS. See the examples folder.

Benchmark

Test file of 100 million intervals (created by subsetting gencode gtf with replacement):

Library	Function	Time (s)	Memory (GB)
bx-python	build	161.7	2.5
ncls	build	3.15	0.5
bx-python	overlap	148.4	4.3
ncls	overlap	7.2	0.5

Building is 50 times faster and overlap queries are 20 times faster. Memory usage is one fifth and one ninth.

Original paper

Alexander V. Alekseyenko, Christopher J. Lee; Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases, Bioinformatics, Volume 23, Issue 11, 1 June 2007, Pages 1386–1393, https://doi.org/10.1093/bioinformatics/btl647

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 142

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗