I am comparing the # of uniquely mapped reads from the same fastq (illumina paired end) to 2 reference sequences using BWA-MEM. I expect mapping to one of the reference sequences (let's call it ref A) to be have many multi-mappers and mapping the other (ref B) to be more unique. Ref A has many duplicate sequences.
I'm using the same bwa-mem parameter for alignment:
bwa mem -M -t 16 -B 1 -T 40
I calculated uniquely mapped reads using samtools view:
unique_alignment=$(samtools view -c -F 0x904) # for excluding secondary, supplementary, unmapped reads mapq30_unique=$(samtools view -c -q 10 -F 0x904)
The results are below:
# mapped reads # uniquely mapped reads % of total seq # of unmapped # of MAPQ=10 % MAPQ=10 refA 15758104 15008036 95.24011264 0 10384098 65.7639396211625 refB 16352354 15632108 95.595459834101 0 14264615 87.9141156068417
As you can see, there isn't any difference in % of uniquely mapped reads (~95% for both) but a difference is seen in MAPQ=10.
From what I read, MAPQ=0 means reads are mapped to many places. I was wondering why using samtools view alone fail to capture uniquely mapped reads, and additional MAPQ filtering is necessary to get the unique mapped reads?
Thanks for the help!