I've read up on the difference between mapped and aligned reads, but I'm still confused about why I'm getting different numbers for mapped and aligned reads.
So, with the python module Pysam, I can get the total amount of mapped reads in a BAM file with the following: samfile.mapped
Which is defined as:
mapped: int with total number of mapped alignments according to the statistics recorded in the index. This is a read-only attribute.
For my BAM i get 3 891 385
mapped reads.
But when I use the samfile.fetch()
function to loop through the reads, I get a different number when I count the reads:
count = 0
for read in samfile.fetch():
count += 1
Definition for fetch:
fetch(self, reference=None, region=None...): fetch reads aligned in a region. Without a reference or region all mapped reads in the file will be fetched.
This gives me a total of 4 053 966
reads instead.
Can anyone explain why I'm getting a minor difference in the amount of reads? Because I wan't to calculate the % of multimapped reads (all secondary alignments), but I'm not sure which total number to use.
EDIT: Maybe also worth mentioning that for the same BAM file there are 1 033 401
unmapped reads
Try changing your program to the following and see what happens:
fetch()
will return all entries in the file with an assigned chromosome. Often unmapped reads are given the coordinates of their mate, so those would be included and inflate the output.I tried your code, printing count gave me
3 891 385
same amount asprint samfile.mapped
would give.Okay, so you think that the extra amount of reads included in fetch are unmapped reads that have been given the coordinated of their mate?
The odds are good, yes. You can always print a couple.