Parsing tabix with pysam
1
0
Entering edit mode
2.3 years ago

I have a bit of a weird issue trying to read out regions from a tabix file via pysam.

For various reasons, I am using Python 2.7.16:

$ python
Python 2.7.16 |Anaconda, Inc.| (default, Mar 14 2019, 21:00:58)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.

I am using cython 0.24.1 and pysam 0.12.0:

$ pip install cython==0.24.1
$ pip install pysam==0.12.0

Here is my test script:

import sys
import pysam
fn="/net/seq/data/projects/genotyping/merged-donor-alignments/insights-100-donors-2019-03-19/data/Allcells_10729/DNase/footprints/interval.all.bedgraph.gz"
handle = pysam.TabixFile(fn)
for r in handle.fetch("chr1", 100000, 150000, parser=pysam.asTuple()):
    sys.stderr.write("{}\n".format(r))
    break

So far, so good:

$ python pysam-test-specificRange.py
chr1  137844  137845  0.0000  0.0000  0.2541  -0.0000 1.0000

Next, I change the handle's fetch call, removing the range of interest:

for r in handle.fetch(parser=pysam.asTuple()):
    sys.stderr.write("{}\n".format(r))
    break

I get a ValueError exception:

$ python pysam-test-allRegions.py
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pysam/libctabix.pyx", line 489, in pysam.libctabix.TabixFile.fetch
ValueError: could not create iterator, possible tabix version mismatch

I don't understand why there is a possible tabix version mismatch, as I can recover intervals if I specify the range manually.

Am I not using fetch correctly, or is there some other issue in using the parser?

pysam tabix • 1.1k views
ADD COMMENT
1
Entering edit mode
2.3 years ago

This message is specific to calling fetch() with no region given. From the pysam code:

if region is None:
    if len(self.contigs) > 0:
        # when accessing a tabix file created prior tabix 1.0
        # the full-file iterator is empty.
        raise ValueError(
            "could not create iterator, possible tabix version mismatch")
    else:
        # possible reason is that the file is empty - return an empty iterator
        return EmptyIterator()

Apparently the old non-HTSlib-based standalone tabix produced indexes that could not be queried without giving a specific region.

What version of tabix was used to index your input file? What happens if you reindex it with a tabix from current HTSlib?

ADD COMMENT
0
Entering edit mode

I don't know. Is there a way to query an archive to get the version used to make it? I tried digging into the tabix header specification but didn't see any version key I could extract. I will probably try recompressing/reindexing the test archive.

ADD REPLY
0
Entering edit mode

Recompressing and reindexing with htslib v1.7 seemed to work better with pysam. Much obliged for the code pointer.

ADD REPLY

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6