Pysam Fetch Behavior
1
1
Entering edit mode
7.7 years ago

I wanted to fetch only certain columns from a tabix indexed file. Using Pysam's fetch command, I get the entire row in a string format. However, I would like to query only the exact column. Is there a quick way to accomplish this?

tabix • 5.6k views
0
Entering edit mode

Could you be more specific? Do you want to fetch only certain columns from a subset of rows, or do you just want certain columns from the whole data?

0
Entering edit mode

Use awk on pysam output for exact column .

1
Entering edit mode

If the OP is using pysam, it's a good guess that they are already in a Python environment. awk, while extremely useful, is a worse fit than simply accessing the AlignedRead object returned by the IteratorRow object returned by Samfile.fetch().

region = Samfile.fetch('chr1:1-1000')
positions = [read.pos for pos in region]

0
Entering edit mode

I am specifically working with TSV files. So, I here is an example of TSV file: chr\tpos\tjudgement 1\t2389\tTrue 1\t2399\tTrue When I use fetch to get row for chr 1 and pos 2399, I get the following: 1\t2399\tTrue However, I want a way to directly access judgement value for chr 1 and pos 2399. I'd like to directly call the exact column value. I cannot use awk. I was wondering if there is a direct way of doing so using Pysam. Thank you for all your help.

1
Entering edit mode
7.7 years ago
row = Tabixfile.fetch(reference='chr1', start=2399, end=2399, parser=asTuple)
judgement = row[2]

0
Entering edit mode
import pysam
tabixfile = pysam.Tabixfile( "/usr/local/xyz.vcf" )

and now when I m doing this:

row = tabixfile.fetch(reference='chr1', start=249240539, end=249240539)

but while doing

x = row[2]

I am getting this error.

Where I am going wrong actually ? Any pointer ?