I am trying to calculate the percentage of reads aligned to a reference genome.
samtools flagstat and this is what I have
24887959 + 0 in total (QC-passed reads + QC-failed reads) 18203689 + 0 secondary 210060 + 0 supplementary 0 + 0 duplicates 22937586 + 0 mapped (92.16% : N/A) 6474210 + 0 paired in sequencing 3237105 + 0 read1 3237105 + 0 read2 2537654 + 0 properly paired (39.20% : N/A) 4022194 + 0 with itself and mate mapped 501643 + 0 singletons (7.75% : N/A) 1475914 + 0 with mate mapped to a different chr 736257 + 0 with mate mapped to a different chr (mapQ>=5)
However, I noticed that the total number of reads is different to the input fastq files (3237105 read 1 + 3237105 read 2 = 6474210). After some reading, the numbers reported in
samtools flagstat also include multiple mapped reads.
So I try this:
samtools view -c -F 260 999.sorted.PE.bam
-c count reads and print the total number -F bitcode skip reads with bits present in 'bitcode' -F 260 only output reads that are not unmapped (flag 4 is not set) and only primary alignment (flag 256 is not set)
4733897 is what I got so if I divide this number with the total read number from fastq files (6474210) I have 0.731 (x 100 = 73.1%).
Is this the correct number to report as a percentage of mapped read? Not the one that's reported in samtools flagstat (92.16%)?