pysam doesn't calculate correct nucleotide quantities from .bam file
1
1
Entering edit mode
3.6 years ago
Gama313 ▴ 120

I am very new to pysam, but I am trying to generate a table in which are listed all the nucleotides by genomic position divided by strand with python. I am working on amplicon data, so it would be possible that pysam would mark them as duplicates (as default). By now I simply re-wrote codes from a previous post (https://www.biostars.org/p/215192/#217314), but counts are not correct. In particular, I observed that only nucleotides observed in Read1 are reported (and are not fully correct), but counts from Read2 are almost absent, even if there are some positions with high counts (but presumably wrong). Anyone experienced such a strange behaviour? Below is reported the code for simplicity: import pysam

samfile = pysam.Samfile("reads.sorted.bam", "rb")
mystring = ''
mylist = []
for pileupcolumn in samfile.pileup("gi|251831106|ref|NC_012920.1|"):        
    mystring = ''
    for pileupread in pileupcolumn.pileups:
        if not pileupread.is_del and not pileupread.is_refskip:                
            mystring = mystring + pileupread.alignment.query_sequence[pileupread.query_position]
            mystring = "".join(sorted(list(mystring)))  # Sort alphabetically
            mylist.append((pileupcolumn.pos+1,mystring))
samfile.close()
pysam count python • 1.1k views
ADD COMMENT
0
Entering edit mode
3.6 years ago

Some random thoughts looking at the documentation for pysam/pileup:

  • max_depth: default to 8000, maybe you need to increase this limit

  • ignore_overlaps: the default is True and it may explain why only read 1 is counted

  • flag_filter: Maybe you want to disable this filter and count all reads

ADD COMMENT
0
Entering edit mode

It worked like a charm. Thanks a lot, I didn't this section. Bests

ADD REPLY

Login before adding your answer.

Traffic: 2885 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6