Question

BWA alignment score calculation

2

Entering edit mode

2.9 years ago

Donovan ▴ 20

Hi,

I'm trying to understand how the alignment score (AS) is calculated by BWA MEM. I'm running BWA MEM (0.7.17-r1188) on single ended reads with default parameters. The command line interface detail how the score is calculated. It indicates a match is 1, a mismatch is -4, a gap opening is -6, and a gap extension is -1. However, this doesn't seem to be the case. Below are two mappings which both have 150 matches (CIGAR = 150M) and 1 mismatch (NM:i:1), but the first mappings has an AS of 149 and the second a AS of 145.

Anyone have any insight into why this might be the case?

Thanks, Donovan

Q1 16 Ref1 5022 4 150M * 0 0 NM:i:1 MD:Z:0G149 AS:i:149 XS:i:145

Q1 256 Ref1 2599 0 150M 0 0 * NM:i:1 MD:Z:56A93 AS:i:145

score SAM BWA alignment • 3.2k views

ADD COMMENT • link updated 13 months ago by kmzhou4 • 0 • written 2.9 years ago by Donovan ▴ 20

score 0 · Answer 1 · 2021-06-10

BWA's scoring algorithm and output format were written prior to the SAM specification including match and mismatch operators (= and X, respectively) for the CIGAR strings. The M in the CIGAR string only means an "alignment match", which can be further classified as either a match or mismatch. The actual match and mismatch information for these older aligners (BWA and Bowtie2 for example) is encoded in the MD:Z field.

In your example, this read Q1 has a primary alignment (first line, SAM flag 16) and alternative alignment (second line, SAM flag 256).

The primary alignment chosen by BWA-MEM has an MD:Z field of 0G149, indicating a G mismatch at the beginning of the alignment followed by 149 matches. The alignment score is AS:i:149, which likely excludes the G mismatch because it is on the end of the alignment. If I had to guess, this may have been a heuristic strategy implemented in their algorithm to prefer mismatches on the ends of alignments (like soft clipping) over mismatches in the middle of reads, all other things equal. You can see there is an alternative alignment for this read whose score is AS:i:145, which refers to the second alignment you posted (with the 256 SAM flag).

The second line is the alternative alignment and has an MD:Z field of 56A93, indicating 56 matches, an A mismatch, then 93 matches. The alignment score is AS:i:145, which you would get from 149 matches and 1 mismatch (-4).

BWA-MEM is an improvement on the original BWA algorithm and likely contains some of these edge-case heuristics that may not be exactly described in their publication but could be found in their source code. If a BWA developer happens to see this, perhaps they could explain better.

SAM Tag Specifications: https://samtools.github.io/hts-specs/SAMtags.pdf

SAM Specifications: https://samtools.github.io/hts-specs/SAMv1.pdf

score 0 · Answer 2 · 2023-03-19

According to my observation, BWA score is calculated with Number_of_match1 + number_of_mismatch(-4) + number_of_gap*(-40) + gap_length(-1). The number is not exact, but close. I believe there should be parameters: 1 match score (any number say 10), 1 mismatch (say -20), gap open (-40), gap extension (-1)