Question: Meaning of BWA-MEM MAPQ values
3
gravatar for Isaac C. Joseph
13 months ago by
Santa Clara, CA
Isaac C. Joseph 60 wrote:

Does anyone know what the MAPQ values produced by BWA-MEM mean?

I'm looking for something similar to what Keith Bradnam discovered for TopHat v1.4.1, where he realized that:

0 = maps to 5 or more locations

1 = maps to 3-4 locations

3 = maps to 2 locations

255 = unique mapping

I'm familiar with the notion that MAPQ should be theoretically be related to the probability of an "incorrect" alignment (10^(-MAPQ/10)), although this is vaguely-specified enough that aligners in practice tend to actually just use something like the above.

Note that this is different than the BWA MAPQ scoring interpretation, because BWA-MEM gives MAPQ scores in the range [0,60], rather than [0,37] as has been established for BWA.

bwa dna-seq alignment • 3.5k views
ADD COMMENTlink modified 12 months ago by John12k • written 13 months ago by Isaac C. Joseph 60
2

My experience is that MAPQ in bwa-aln has almost no predictive value, but it is pretty useful in bwa-mem and seems to correlate with the stated definition.

Note that statements such as "maps to 2 locations" have little meaning. Any read can map to (or originate from) any location in any genome. Even if you add "...equally well", that's still subjective. So, ultimately, MAPQ is just the aligner's confidence that the read originated from the stated location, and Tophat apparently does not attempt to implement the specification. Other aligners typically do. No per-aligner decoding table is needed; if the MAPQ is 3 or less, the aligner has roughly 50% or lower confidence that the mapping is correct, implying (but not requiring) that there is at least one other location with similarly high confidence.

ADD REPLYlink modified 13 months ago • written 13 months ago by Brian Bushnell16k

Very good point regarding the meaninglessness of "maps to n locations."

it is pretty useful in bwa-mem and seems to correlate with the stated definition.

Interesting. Do you have any more evidence on this? If so, might be worth a blog post like the above.

ADD REPLYlink modified 13 months ago • written 13 months ago by Isaac C. Joseph 60

I generated synthetic data and mapped it to create a ROC curve a while back, and my recollection was that it looked reasonable. You can do this yourself if you want for any aligner with BBMap's randomreads.sh and samtoroc.sh. It only works with single-ended reads though because of the sam specification's requirement for renaming reads.

ADD REPLYlink modified 13 months ago • written 13 months ago by Brian Bushnell16k

Just quibbling, but the blog post you linked found for Tophat 1.4.1:

0 = maps to 5 or more locations
1 = maps to 3-4 locations
3 = maps to 2 locations
255 = unique mapping

And for Tophat2:

0 = maps to 10 or more locations
1 = maps to 4-9 locations
2 = maps to 3 locations
3 = maps to 2 locations
50 = unique mapping
ADD REPLYlink written 13 months ago by h.mon20k

...herein lies the utter confusion and madness of how various programs interpret MAPQ. The SAM specification clearly states that

a value 255 indicates that the mapping quality is not available

ADD REPLYlink modified 13 months ago • written 13 months ago by Kevin Blighe30k
1

Well, for Bowtie, it isn't available :)

ADD REPLYlink written 13 months ago by Brian Bushnell16k

Maybe someone should make an attempt to find a common standard^^

ADD REPLYlink written 13 months ago by ATpoint8.1k

I was think of changing it from a number to just the word "CORRECT" or "WRONG" which seems more straightforward.

ADD REPLYlink written 13 months ago by Brian Bushnell16k

but incompatible with every downstream software out there, right?

ADD REPLYlink modified 13 months ago • written 13 months ago by ATpoint8.1k

Ah, you got me there =)

ADD REPLYlink written 13 months ago by Brian Bushnell16k
2
gravatar for John
12 months ago by
John12k
Germany
John12k wrote:

MAPQ scores are not meaningful because BAM is not meaningful - or rather, the field has yet to define the difference between read-alignments (what BAM officially stores) and fragment-alignments (what most aligners produce).

The issue goes far deeper than MAPQ scores, but if you want to read about MAPQ this is a good resource: https://sequencing.qcfail.com/articles/mapq-values-are-really-useful-but-their-implementation-is-a-mess/

But attempting to create a common standard is impossible. People want different things out of the MAPQ score, and so there can not be a single standard. Some people want it to represent how well the alignment aligns, others how well the fragment in it's entirety aligns, others a combination of both, sometimes with and sometimes without taking into account how many other possible alignments there might be for the read/fragment. It's a mess, and it all stems from the fact that pretty much every aligner produces non-standard BAMs except BWA, which produces BAMs no one would use if they needed to store more than one alignment per sequencing event.

ADD COMMENTlink modified 12 months ago • written 12 months ago by John12k
0
gravatar for Kevin Blighe
13 months ago by
Kevin Blighe30k
Kevin Blighe30k wrote:

From what I can see, the SAM specification document (http://samtools.github.io/hts-specs/SAMv1.pdf) states that the mapping quality can be anything from 0 to (2^8)-1, i.e., 0-255. See the table in section 1.4 of the document.

Regarding the values used by TopHat, it's regrettable that it drifts from the standard specification for MAPQ, but things like this happen in bioinformatics.

Kevin

ADD COMMENTlink written 13 months ago by Kevin Blighe30k

Yes, I realize that anything from 0 to 255 is acceptable according to the spec, but the meaning (in terms of number of mapping locations, likely) of the very small subset of that range that's actually used in practice by BWA-MEM is what I'm interested in.

ADD REPLYlink written 13 months ago by Isaac C. Joseph 60

As far as I know, BWA mem assignes 0-60 and 255, with 0 = equally well mapping to > 1 location, 255 = MAPQ not available and 1-60 the transformed probability as you stated. Every aligner uses different MAPQ scoring. Google around a bit, there a plenty of blogs about the MAPQ mess with the common alignment tools.

ADD REPLYlink written 13 months ago by ATpoint8.1k
1

Note that BWA-MEM is not the same as BWA algorithmically and thus in terms of MAPQs produced, and lack of finding anything on Google® for BWA-MEM specifically led to this question.

ADD REPLYlink modified 13 months ago • written 13 months ago by Isaac C. Joseph 60

Yes BWA mem doesn't go above 60, a value of which, if interpreted literally and untransformed, means that the read has been aligned with a 1-in-1 million error rate. Over the genomic scale, I guess that this is still not sufficient.

Lately I have been choosing only uniquely mapping reads, and dealing with the lost coverage. However, validation work we did in clinical genetics shows that variants are best called at 30 read depth (not 100, not 1000, etc)., with a minimum 'comfortable' cutoff of 18.

Kevin

ADD REPLYlink modified 13 months ago • written 13 months ago by Kevin Blighe30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1818 users visited in the last hour