I've read up on the difference between mapped and aligned reads, but I'm still confused about why I'm getting different numbers for mapped and aligned reads.
So, with the python module Pysam, I can get the total amount of mapped reads in a BAM file with the following:
Which is defined as:
mapped: int with total number of mapped alignments according to the statistics recorded in the index. This is a read-only attribute.
For my BAM i get
3 891 385 mapped reads.
But when I use the
samfile.fetch() function to loop through the reads, I get a different number when I count the reads:
count = 0 for read in samfile.fetch(): count += 1
Definition for fetch:
fetch(self, reference=None, region=None...): fetch reads aligned in a region. Without a reference or region all mapped reads in the file will be fetched.
This gives me a total of
4 053 966 reads instead.
Can anyone explain why I'm getting a minor difference in the amount of reads? Because I wan't to calculate the % of multimapped reads (all secondary alignments), but I'm not sure which total number to use.
EDIT: Maybe also worth mentioning that for the same BAM file there are
1 033 401 unmapped reads