Forum:Biofinysics - How does bowtie2 assign MAPQ scores?
Entering edit mode
9.3 years ago

Every few weeks there's a post here asking how to get 'unique' alignments from bowtie2 output - there's a question from me in the archive, too.

Today I found this 'experiment' in which John Urban from (I think) Brown University went through the trouble of simulating reads and genome to finally find out:

English Summary:

(Remember that AS is the alignment score of the best alignment, XS is the alignment score of the second best alignment)

Bowtie2 does not use the number of times a read mapped in the calculation for MAPQ. Instead, it just uses the AS and the XS, which is either >= the minimum score or below it. If AS == XS, the read is considered a true multiread and can only get a score of 0 or 1. If XS is below the minimum score, the read is considered a true uniread under the given scoring scheme. True unireads can get scores of 0, 3, 8, 23, 24, 40 and 42. A score of 0 is reserved for the true unireads with an AS in the bottom 30% of allowable scores. The scores 3, 8, 23, 24, 40, and 42 are unique to true unireads. Therefore, if someone is hell bent on taking only "unireads" with decent alignment scores, the way to do it would be to take only reads with those scores.

P.S.: The follow-up post on multireads is worth reading too:

alignment bowtie2 • 8.4k views
Entering edit mode

For those wondering, I posted the C code that he based his python implementation on here last year. It's what I use in Bison.

BTW, the concept of even a uniread is a bit iffy, since one can change that by tweaking the --score-min argument (in my C code, this is the scMin argument, though it might be easier to just reference this file from Bison). My general recommendation is to ignore any concept related to uniqueness and just rely on the reliability of the alignment (namely, the MAPQ score). Granted, MAPQ scores will change when you play with --score-min, so that should be kept in mind.

Entering edit mode

Hi Devon - thanks for your C code! It helped me digest the bt2 mapq logic. I agree that the uni-read concept is iffy. For others interested in a discussion about why "uniread" is at best ill-defined, I wrote about this topic around the same time I wrote about the bowtie2 posts in a post called The slow death of the term "uniquely mappable" in deep sequencing studies and resisting the "conservative" urge to toss out data:

Entering edit mode

Hi Phillip - thanks for reading and promoting these posts. You are correct - I am currently at Brown University.

Entering edit mode

A much older (from 2009) musing by Heng Li on the subject

Mapping uniqueness was not widely used two years ago and will not be widely used two years later. It is just a temporary concept, reflecting our lack of knowledge on measuring the reliability of an alignment

I keep track of this mostly to see when the prediction of "two-years later" will come true. We are getting there.


Login before adding your answer.

Traffic: 1410 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6