Pysam pileup method gives more columns than expected
1
0
Entering edit mode
23 months ago

Hi all!

I'm trying to write a python script for counting reads at each position of a genomic region. I wanted to use the pysam pileup method but the output confuses me. In particular I seem to be getting way more pileup columns than I expected. If I understood it correctly I should get one column per position in the supplied region, however when running the code below this doesn't seem to be the case.

itr = bam_file.pileup("chr1",15216562,15216662)
counter = 0
for x in itr:
counter += 1


The region is 100bp but the counter shows a value of 65531 after iterating over all pileup columns. Why does the counter not show 100? I'm probably misunderstanding something so very grateful for help with figuring this out!

Thanks!

pysam python • 703 views
0
Entering edit mode
23 months ago
khorms ▴ 210

pysam pileup chooses all the reads that align to the selected region, then iterates through all the positions covered by these reads. Therefore, it's expected that the value you get in the end for counter is larger than 100 (since some reads extend beyond the region you have specified). I am not sure why the number you are getting is so large though; you could see what positions are you iterating through by printing the pos attribute print(x.pos)

0
Entering edit mode

If the reads/contigs are long enough (it can be an assembly file), the number can be so large like that.