Pysam - Different amount of mapped and aligned reads
0
0
Entering edit mode
4.7 years ago
AlchemistMoz ▴ 20

I've read up on the difference between mapped and aligned reads, but I'm still confused about why I'm getting different numbers for mapped and aligned reads.

So, with the python module Pysam, I can get the total amount of mapped reads in a BAM file with the following: samfile.mapped

Which is defined as:

mapped: int with total number of mapped alignments according to the statistics recorded in the index. This is a read-only attribute.

For my BAM i get 3 891 385 mapped reads.

But when I use the samfile.fetch() function to loop through the reads, I get a different number when I count the reads:

count = 0
for read in samfile.fetch():
count += 1


Definition for fetch:

fetch(self, reference=None, region=None...): fetch reads aligned in a region. Without a reference or region all mapped reads in the file will be fetched.

This gives me a total of 4 053 966 reads instead.

Can anyone explain why I'm getting a minor difference in the amount of reads? Because I wan't to calculate the % of multimapped reads (all secondary alignments), but I'm not sure which total number to use.

EDIT: Maybe also worth mentioning that for the same BAM file there are 1 033 401 unmapped reads

python pysam mapped alignment • 3.1k views
0
Entering edit mode

Try changing your program to the following and see what happens:

count = 0
for read in samfile.fetch():
count += 1


fetch() will return all entries in the file with an assigned chromosome. Often unmapped reads are given the coordinates of their mate, so those would be included and inflate the output.

0
Entering edit mode

I tried your code, printing count gave me 3 891 385 same amount as print samfile.mapped would give.

0
Entering edit mode

Okay, so you think that the extra amount of reads included in fetch are unmapped reads that have been given the coordinated of their mate?

0
Entering edit mode

The odds are good, yes. You can always print a couple.