Pysam pileup method gives more columns than expected
1
0
Entering edit mode
3.9 years ago

Hi all!

I'm trying to write a python script for counting reads at each position of a genomic region. I wanted to use the pysam pileup method but the output confuses me. In particular I seem to be getting way more pileup columns than I expected. If I understood it correctly I should get one column per position in the supplied region, however when running the code below this doesn't seem to be the case.

itr = bam_file.pileup("chr1",15216562,15216662)
counter = 0
for x in itr:
    counter += 1

The region is 100bp but the counter shows a value of 65531 after iterating over all pileup columns. Why does the counter not show 100? I'm probably misunderstanding something so very grateful for help with figuring this out!

Thanks!

pysam python • 1.1k views
ADD COMMENT
0
Entering edit mode
3.9 years ago
khorms ▴ 230

pysam pileup chooses all the reads that align to the selected region, then iterates through all the positions covered by these reads. Therefore, it's expected that the value you get in the end for counter is larger than 100 (since some reads extend beyond the region you have specified). I am not sure why the number you are getting is so large though; you could see what positions are you iterating through by printing the pos attribute print(x.pos)

ADD COMMENT
0
Entering edit mode

If the reads/contigs are long enough (it can be an assembly file), the number can be so large like that.

ADD REPLY

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6