True score of alignment BWA-MEM
1
0
Entering edit mode
5.2 years ago
arunsub • 0

Hey everyone,

I recently started using BWA-MEM for aligning reads to the human genome.

Can anyone tell me why BWA-MEM does not report the true score of the alignment (corresponding to the MD tag) in the AS tag?

Thanks. Any inputs appreciated.

alignment bwa • 3.7k views
1
Entering edit mode
5.2 years ago

The alignment that I get with bwa do contain the AS tags. So it is strange that yours do not.

samtools view -H http://data.biostarhandbook.com/bam/demo.bam | grep PG


prints:

@PG     ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:bwa mem /Users/ialbert/refs/ebola/2014.fa SRR1553425_1.fastq SRR1553425_2.fastq


whereas:

samtools view http://data.biostarhandbook.com/bam/demo.bam | cut -f 6,14 | head


prints:

101M    AS:i:101
101M    AS:i:101
101M    AS:i:101
71H30M  AS:i:30
...


Since a match is scored as 1, the alignment score is indeed 101

0
Entering edit mode

Thanks Istvan, my question was more about the value in the AS tag.

test3 0 chr20 47606481 60 100M * 0 0 AAAAAAAAAAATCAGTTTTCCACTGAGGAATGTCCATGATGAAGCAGCAACACTACACCTGGCCCTCATTCCCTTTTTTCCTTAAGTACCTTTCACTGAA 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 NM:i:2 MD:Z:29T68T1 AS:i:93 XS:i:0

Computing score from MD we get 90, but the value reported is 93. I know that BWA-MEM keeps track of true score, but any reason why it is not reported as part of AS tag?

0
Entering edit mode

I vaguely recall reading a statement either in the (BWA manual or the SAM spec) though I am unable to find it now, how the alignment score may not match the MD tag or CIGAR strings. It struck me as odd, back then but has to do with the way things work. CIGAR and MD can be determined faster than an alignment score. And that is one reason why the AS is not required to be present by default.

I would trust the score in the AS as being the correct one rather than the one computed from MD tags.

0
Entering edit mode

In this 2013 thread Heng Li explain why AS can be different than score computed from MD or CIGAR. It might explain your observation.