I am having a problem understanding the output of tophat concerning the multi-mapped reads. I would like to create a barplot of the unique vs. multi-mapped vs. unmapped reads from tophat-mapped bam files. (BTW is there already a tool for doing that?)
Now I wanted to extract the multi-mapped reads from one bam file. lat us say MW6. this is the align_summary.txt file I have:
Reads: Input : 26140314 Mapped : 25159791 (96.2% of input) of these: 1027691 ( 4.1%) have multiple alignments (1832 have >20) 96.2% overall read mapping rate.
But when I try to count the reads manually I get this :
$samtools view bamFiles/MW6.sorted.bam | grep -w -v "NH:i:1" | wc -l 4118642
and when I use the flagstat option from samtools:
$samtools flagstat bamFiles/MW6.SE.sorted.bam 28250742 + 0 in total (QC-passed reads + QC-failed reads) 3090951 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 28250742 + 0 mapped (100.00% : N/A) 0 + 0 paired in sequencing ... (the rest is 0)
The total number of input reads is 26140314 (as it says in the
align_summary file). The total mapped reads from the bam file is 28250742, as says in the
samtools flagstat. I guess this conflict is due to reads which mapped multiple times, but how can I get the exact number of multiple reads?
is it just 28250742 - 26140314 = 2110428 ?
Would it be similar in paired-end and single-end mapping runs?