I have a bit of a weird issue trying to read out regions from a tabix file via pysam.
For various reasons, I am using Python 2.7.16:
$ python
Python 2.7.16 |Anaconda, Inc.| (default, Mar 14 2019, 21:00:58)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
I am using cython
0.24.1 and pysam
0.12.0:
$ pip install cython==0.24.1
$ pip install pysam==0.12.0
Here is my test script:
import sys
import pysam
fn="/net/seq/data/projects/genotyping/merged-donor-alignments/insights-100-donors-2019-03-19/data/Allcells_10729/DNase/footprints/interval.all.bedgraph.gz"
handle = pysam.TabixFile(fn)
for r in handle.fetch("chr1", 100000, 150000, parser=pysam.asTuple()):
sys.stderr.write("{}\n".format(r))
break
So far, so good:
$ python pysam-test-specificRange.py
chr1 137844 137845 0.0000 0.0000 0.2541 -0.0000 1.0000
Next, I change the handle's fetch
call, removing the range of interest:
for r in handle.fetch(parser=pysam.asTuple()):
sys.stderr.write("{}\n".format(r))
break
I get a ValueError
exception:
$ python pysam-test-allRegions.py
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pysam/libctabix.pyx", line 489, in pysam.libctabix.TabixFile.fetch
ValueError: could not create iterator, possible tabix version mismatch
I don't understand why there is a possible tabix version mismatch, as I can recover intervals if I specify the range manually.
Am I not using fetch
correctly, or is there some other issue in using the parser?
I don't know. Is there a way to query an archive to get the version used to make it? I tried digging into the tabix header specification but didn't see any version key I could extract. I will probably try recompressing/reindexing the test archive.
Recompressing and reindexing with htslib v1.7 seemed to work better with pysam. Much obliged for the code pointer.