Question: Meaning of BWA-MEM MAPQ values
2
gravatar for Isaac C. Joseph
10 months ago by
Santa Clara, CA
Isaac C. Joseph 40 wrote:

Does anyone know what the MAPQ values produced by BWA-MEM mean?

I'm looking for something similar to what Keith Bradnam discovered for TopHat v1.4.1, where he realized that:

0 = maps to 5 or more locations

1 = maps to 3-4 locations

3 = maps to 2 locations

255 = unique mapping

I'm familiar with the notion that MAPQ should be theoretically be related to the probability of an "incorrect" alignment (10^(-MAPQ/10)), although this is vaguely-specified enough that aligners in practice tend to actually just use something like the above.

Note that this is different than the BWA MAPQ scoring interpretation, because BWA-MEM gives MAPQ scores in the range [0,60], rather than [0,37] as has been established for BWA.

bwa dna-seq alignment • 2.4k views
ADD COMMENTlink modified 9 months ago by John12k • written 10 months ago by Isaac C. Joseph 40
2

My experience is that MAPQ in bwa-aln has almost no predictive value, but it is pretty useful in bwa-mem and seems to correlate with the stated definition.

Note that statements such as "maps to 2 locations" have little meaning. Any read can map to (or originate from) any location in any genome. Even if you add "...equally well", that's still subjective. So, ultimately, MAPQ is just the aligner's confidence that the read originated from the stated location, and Tophat apparently does not attempt to implement the specification. Other aligners typically do. No per-aligner decoding table is needed; if the MAPQ is 3 or less, the aligner has roughly 50% or lower confidence that the mapping is correct, implying (but not requiring) that there is at least one other location with similarly high confidence.

ADD REPLYlink modified 10 months ago • written 10 months ago by Brian Bushnell15k

Very good point regarding the meaninglessness of "maps to n locations."

it is pretty useful in bwa-mem and seems to correlate with the stated definition.

Interesting. Do you have any more evidence on this? If so, might be worth a blog post like the above.

ADD REPLYlink modified 10 months ago • written 10 months ago by Isaac C. Joseph 40

I generated synthetic data and mapped it to create a ROC curve a while back, and my recollection was that it looked reasonable. You can do this yourself if you want for any aligner with BBMap's randomreads.sh and samtoroc.sh. It only works with single-ended reads though because of the sam specification's requirement for renaming reads.

ADD REPLYlink modified 10 months ago • written 10 months ago by Brian Bushnell15k

Just quibbling, but the blog post you linked found for Tophat 1.4.1:

0 = maps to 5 or more locations
1 = maps to 3-4 locations
3 = maps to 2 locations
255 = unique mapping

And for Tophat2:

0 = maps to 10 or more locations
1 = maps to 4-9 locations
2 = maps to 3 locations
3 = maps to 2 locations
50 = unique mapping
ADD REPLYlink written 10 months ago by h.mon16k

...herein lies the utter confusion and madness of how various programs interpret MAPQ. The SAM specification clearly states that

a value 255 indicates that the mapping quality is not available

ADD REPLYlink modified 10 months ago • written 10 months ago by Kevin Blighe24k
1

Well, for Bowtie, it isn't available :)

ADD REPLYlink written 10 months ago by Brian Bushnell15k

Maybe someone should make an attempt to find a common standard^^

ADD REPLYlink written 10 months ago by ATpoint5.7k

I was think of changing it from a number to just the word "CORRECT" or "WRONG" which seems more straightforward.

ADD REPLYlink written 10 months ago by Brian Bushnell15k

but incompatible with every downstream software out there, right?

ADD REPLYlink modified 10 months ago • written 10 months ago by ATpoint5.7k

Ah, you got me there =)

ADD REPLYlink written 10 months ago by Brian Bushnell15k
2
gravatar for John
9 months ago by
John12k
Germany
John12k wrote:

MAPQ scores are not meaningful because BAM is not meaningful - or rather, the field has yet to define the difference between read-alignments (what BAM officially stores) and fragment-alignments (what most aligners produce).

The issue goes far deeper than MAPQ scores, but if you want to read about MAPQ this is a good resource: https://sequencing.qcfail.com/articles/mapq-values-are-really-useful-but-their-implementation-is-a-mess/

But attempting to create a common standard is impossible. People want different things out of the MAPQ score, and so there can not be a single standard. Some people want it to represent how well the alignment aligns, others how well the fragment in it's entirety aligns, others a combination of both, sometimes with and sometimes without taking into account how many other possible alignments there might be for the read/fragment. It's a mess, and it all stems from the fact that pretty much every aligner produces non-standard BAMs except BWA, which produces BAMs no one would use if they needed to store more than one alignment per sequencing event.

ADD COMMENTlink modified 9 months ago • written 9 months ago by John12k
0
gravatar for Kevin Blighe
10 months ago by
Kevin Blighe24k
Republic of Ireland
Kevin Blighe24k wrote:

From what I can see, the SAM specification document (http://samtools.github.io/hts-specs/SAMv1.pdf) states that the mapping quality can be anything from 0 to (2^8)-1, i.e., 0-255. See the table in section 1.4 of the document.

Regarding the values used by TopHat, it's regrettable that it drifts from the standard specification for MAPQ, but things like this happen in bioinformatics.

Kevin

ADD COMMENTlink written 10 months ago by Kevin Blighe24k

Yes, I realize that anything from 0 to 255 is acceptable according to the spec, but the meaning (in terms of number of mapping locations, likely) of the very small subset of that range that's actually used in practice by BWA-MEM is what I'm interested in.

ADD REPLYlink written 10 months ago by Isaac C. Joseph 40

As far as I know, BWA mem assignes 0-60 and 255, with 0 = equally well mapping to > 1 location, 255 = MAPQ not available and 1-60 the transformed probability as you stated. Every aligner uses different MAPQ scoring. Google around a bit, there a plenty of blogs about the MAPQ mess with the common alignment tools.

ADD REPLYlink written 10 months ago by ATpoint5.7k
1

Note that BWA-MEM is not the same as BWA algorithmically and thus in terms of MAPQs produced, and lack of finding anything on Google® for BWA-MEM specifically led to this question.

ADD REPLYlink modified 10 months ago • written 10 months ago by Isaac C. Joseph 40

Yes BWA mem doesn't go above 60, a value of which, if interpreted literally and untransformed, means that the read has been aligned with a 1-in-1 million error rate. Over the genomic scale, I guess that this is still not sufficient.

Lately I have been choosing only uniquely mapping reads, and dealing with the lost coverage. However, validation work we did in clinical genetics shows that variants are best called at 30 read depth (not 100, not 1000, etc)., with a minimum 'comfortable' cutoff of 18.

Kevin

ADD REPLYlink modified 10 months ago • written 10 months ago by Kevin Blighe24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 581 users visited in the last hour