Question

Pysam pileup method gives more columns than expected

0

Entering edit mode

3.9 years ago

Didrik.Olofsson • 0

Hi all!

I'm trying to write a python script for counting reads at each position of a genomic region. I wanted to use the pysam pileup method but the output confuses me. In particular I seem to be getting way more pileup columns than I expected. If I understood it correctly I should get one column per position in the supplied region, however when running the code below this doesn't seem to be the case.

itr = bam_file.pileup("chr1",15216562,15216662)
counter = 0
for x in itr:
    counter += 1

The region is 100bp but the counter shows a value of 65531 after iterating over all pileup columns. Why does the counter not show 100? I'm probably misunderstanding something so very grateful for help with figuring this out!

Thanks!

pysam python • 1.1k views

ADD COMMENT • link updated 3.9 years ago by khorms ▴ 230 • written 3.9 years ago by Didrik.Olofsson • 0

score 0 · Answer 1 · 2020-06-21

0

Entering edit mode

3.9 years ago

khorms ▴ 230

pysam pileup chooses all the reads that align to the selected region, then iterates through all the positions covered by these reads. Therefore, it's expected that the value you get in the end for counter is larger than 100 (since some reads extend beyond the region you have specified). I am not sure why the number you are getting is so large though; you could see what positions are you iterating through by printing the pos attribute print(x.pos)

ADD COMMENT • link 3.9 years ago by khorms ▴ 230

0

Entering edit mode

If the reads/contigs are long enough (it can be an assembly file), the number can be so large like that.

ADD REPLY • link 3.6 years ago by jianzhi0515 • 0