Question: Pysam pileup method gives more columns than expected
gravatar for Didrik.Olofsson
7 months ago by
Didrik.Olofsson0 wrote:

Hi all!

I'm trying to write a python script for counting reads at each position of a genomic region. I wanted to use the pysam pileup method but the output confuses me. In particular I seem to be getting way more pileup columns than I expected. If I understood it correctly I should get one column per position in the supplied region, however when running the code below this doesn't seem to be the case.

itr = bam_file.pileup("chr1",15216562,15216662)
counter = 0
for x in itr:
    counter += 1

The region is 100bp but the counter shows a value of 65531 after iterating over all pileup columns. Why does the counter not show 100? I'm probably misunderstanding something so very grateful for help with figuring this out!


pysam python • 246 views
ADD COMMENTlink modified 7 months ago by khorms210 • written 7 months ago by Didrik.Olofsson0
gravatar for khorms
7 months ago by
khorms210 wrote:

pysam pileup chooses all the reads that align to the selected region, then iterates through all the positions covered by these reads. Therefore, it's expected that the value you get in the end for counter is larger than 100 (since some reads extend beyond the region you have specified). I am not sure why the number you are getting is so large though; you could see what positions are you iterating through by printing the pos attribute print(x.pos)

ADD COMMENTlink written 7 months ago by khorms210

If the reads/contigs are long enough (it can be an assembly file), the number can be so large like that.

ADD REPLYlink written 4 months ago by jianzhi05150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1220 users visited in the last hour