Question: Pysam pileup dropping reads
0
gravatar for yryan
10 months ago by
yryan0
yryan0 wrote:

Hi folks,

I'm trying to use pysam's pileup in order to count some base changes occurring at certain points in my transcript of interest using MinION data. However my coverage at each base is often three times as large as the total of reads in my final counts and I'm trying to figure out this discrepancy.

My code is the following:

def stats_4605(input_file):
A = 0
T = 0
C = 0
G = 0
total = 0
samfile = pysam.AlignmentFile(input_file, "rb")
for pileupcolumn in samfile.pileup("ref", 4604, 4605):
    if pileupcolumn.pos != 4604:
        continue
    else:
        print(input_file, "\nCoverage at base %s = %s" %
               (pileupcolumn.pos, pileupcolumn.n))
        for pileupread in pileupcolumn.pileups:
            total += 1
            if not pileupread.is_del and not pileupread.is_refskip:
                if pileupread.alignment.query_sequence[pileupread.query_position] == "A":
                    A += 1
                elif pileupread.alignment.query_sequence[pileupread.query_position] == "T":
                    T += 1
                elif pileupread.alignment.query_sequence[pileupread.query_position] == "C":
                    C += 1
                elif pileupread.alignment.query_sequence[pileupread.query_position] == "G":
                    G += 1
        print("Total counted ", total)
        print("A: ", A, "T: ", T, "C: ", C, "G: ", G)
        print("T:A ratio = ", T/A)
        print('')

samfile.close()

An example of the output I get is:

Coverage at base 4604 = 11547
Total counted  4502
A:  3063 T:  577 C:  4 G:  301
T:A ratio =  0.18837740777015996

So far I think the problem is with the line if not pileupread.is_del and not pileupread.is_refskip:

Deleting pileup.is refskip shows no changes, but pileupread.is_del gives an error message:

    Traceback (most recent call last):
  File "/home/user/PycharmProjects/samp_comp_extractor/4605_a_vs_t.py", line 36, in <module>
    stats_4605("6hr_mapped_sorted.bam")
  File "/home/user/PycharmProjects/samp_comp_extractor/4605_a_vs_t.py", line 20, in stats_4605
    if pileupread.alignment.query_sequence[pileupread.query_position] == "A":
TypeError: string indices must be integers

Any idea if these are related or how I could get around these issues? I've tried generating a VCF for these reads but can't get it to produce an AF segment and would prefer more in depth numbers for my use case.

Cheers!

pileup pysam • 374 views
ADD COMMENTlink written 10 months ago by yryan0

The problem with the code is that pileupread.query_position is not an integer. Not sure why.

ADD REPLYlink written 10 months ago by i.sudbery9.8k

Oh.... Because if the base is deleted from the "query" (i.e. the read), then it has no position in the query_sequence.

ADD REPLYlink written 10 months ago by i.sudbery9.8k

Eyeballing it in a genomeviewer there are some deleted bases but looks do be much closer to 1 in 10 reads rather than 5-6/10 reads which my script is returning. I'm not sure if its a nanopore issue since it is only single reads and does have what looks to be many deletions but not enough over my region of interest to cause this drop in reads

ADD REPLYlink written 10 months ago by yryan0

Yeah, sorry, my comment was over what was causing the error from your code, not what was causing the underlying difference in read counts.

I noticed from the pysam documentation that n() includes bases that don't pass the quality filter. Perhaps pileups excludes these? What does get_num_aligned() return?

ADD REPLYlink written 10 months ago by i.sudbery9.8k

It gives me the same number that my print("Total counted ", total) line gives me. But in the docs it says :

This method applies a base quality filter and the number is equal to the size of get_query_sequences(), get_mapping_qualities(), etc.

This makes me think its a quality score issue, I'll have a dig around seeing if its possibly to set that cutoff lower but if you have any advice there that would be great!

ADD REPLYlink written 10 months ago by yryan0

I've found set_min_base_quality(self, min_base_quality) and added that to my code on line 15 just after the else which now works perfectly. Thanks for helping point me in the right direction!

ADD REPLYlink written 10 months ago by yryan0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1521 users visited in the last hour