difference between mapq=0 and samtools flag 0x900
1
0
Entering edit mode
3.6 years ago
bitpir ▴ 200

Hello,

I am comparing the # of uniquely mapped reads from the same fastq (illumina paired end) to 2 reference sequences using BWA-MEM. I expect mapping to one of the reference sequences (let's call it ref A) to be have many multi-mappers and mapping the other (ref B) to be more unique. Ref A has many duplicate sequences.

I'm using the same bwa-mem parameter for alignment:

bwa mem -M -t 16 -B 1 -T 40


I calculated uniquely mapped reads using samtools view:

unique_alignment=$(samtools view -c -F 0x904) # for excluding secondary, supplementary, unmapped reads mapq30_unique=$(samtools view -c -q 10 -F 0x904)


The results are below:

    # mapped reads  # uniquely mapped reads % of total seq  # of unmapped   # of MAPQ=10    % MAPQ=10
refA    15758104    15008036    95.24011264 0   10384098    65.7639396211625
refB    16352354    15632108    95.595459834101 0   14264615    87.9141156068417


As you can see, there isn't any difference in % of uniquely mapped reads (~95% for both) but a difference is seen in MAPQ=10.

From what I read, MAPQ=0 means reads are mapped to many places. I was wondering why using samtools view alone fail to capture uniquely mapped reads, and additional MAPQ filtering is necessary to get the unique mapped reads?

Thanks for the help!

samtools mapq • 2.1k views
1
Entering edit mode
3.6 years ago

All mapped reads have a primary alignment, which is what you end up counting with -F 0x904. You're not just excluding "multimappers" (that's not a terribly useful term) with the addition of -q 10, but also more generically uncertain alignments.

If you really want to see how well your dataset aligns to the two genomes then you need to instead concatenate the genomes together and align against the resulting merged genome. Then split the results by genome.