Tool:pyncls not dead! The datastructure nested containment list (faster than intervaltrees) revived
0
3
Entering edit mode
6.1 years ago
endrebak ▴ 960

https://github.com/endrebak/ncls

Just released. Bug reports welcome.

I should make better timings. I see that I did not explore the full space of possibilities in first announcement. Seems to be many times faster than intervaltrees for most of my uses though.

from ncls import NCLS

import pandas as pd

starts = pd.Series(range(0, 5))
ends = starts + 100
ids = starts

ncls = NCLS(starts.values, ends.values, ids.values)

it = ncls.find_overlap(0, 2)
for i in it:
    print(i)
# (0, 100, 0)
# (1, 101, 1)
nested-containment-list python • 1.6k views
ADD COMMENT
1
Entering edit mode

I think it's supposed to be faster for short intervals?

ADD REPLY
0
Entering edit mode

Yes, brentp kindly pointed out some possible errors with my timings. Still, it seems many times faster for most use-cases I have tried. GenomicRanges has also changed from intervaltree to NCLS.

ADD REPLY
0
Entering edit mode

Could you elaborate on use cases for this?

ADD REPLY
1
Entering edit mode

To me, bioinformatics is basically range overlap. So it speeds up all my work by a lot. My reason for wrapping it is that I am writing a GenomicRanges for Python and want incredibly fast object creation and overlap/intersection queries.

ADD REPLY

Login before adding your answer.

Traffic: 2911 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6