Question

Bowtie2 MAPQ difference using local vs end-to-end alignment

2

Entering edit mode

5.1 years ago

polys ▴ 20

Hello everyone, I am currently trying to learn about ngs data analysis. Specifically, I have data from targeted (panel) ngs from human liquid biopsy samples, in which we have to determine the presence of somatic variants that are at a very low allele fraction (ctDNA in peripheral blood). The company that sold us this service conveniently forgot to mention that they were not going to hand us the final results, so it's up to us to decide, based on the BAM files they provided us with, which mutations could be considered valid and which might be artifacts or errors.... This is all very new to me, so I have lots of doubts. They are offering support, but there are some things that I can't seem to fully understand.

The thing is, when I look at specific positions in the BAM file, take KRAS codon 12 as an example, I can see that there is a variant in that position, but MAPQ values for the amplicons that support the position are lower than MAPQ values for amplicons that support the reference (18 vs 33). The company's support division explained to me that those differences in MAPQ could indicate that the variant is not actually real and that it should be confirmed by another method (confirming variants is not an option for us). Biologically speaking, since we are dealing with a cancer patient sample, and giving that KRAS codon 12 is a very common and important mutation, we thought: what are the chances that this specific mutation was produced by an error?

So, I decided to make some processing of my own, since they provided the raw data files, always focusing on KRAS codon 12 to compare results:

First of all, I tried aligning with BWA. Results were great, KRAS mutated reads had MAPQ of 60, same as reference reads
I then aligned the data using Bowtie2 (this is what the company uses) in end-to-end mode. Again, results were great (98,9% overall rate alignment), and no difference in MAPQ between mutated/reference reads (MAPQ 42)
Finally, I got support people to tell me the exact command that they had used to get to the final BAM file, and realized that they were using local alignment mode. Run the alignment in this mode (rate alignment was raised a little bit, I think around 99,2%, cannot remember), and finally got similar BAM to the company's BAM: MAPQ 18 vs 33.

So, I wanted an expert opinion on this, since, I repeat, this is all very new to me. Is it advantageous to perform local alignment in targeted ngs?? I cannot understand how local aligment can hurt MAPQ so much for these amplicons given that there seems to be no soft-clipped bases in the alignment.

I'm sorry if it got too long, want to be as detailed as possible. Hope you can help me

Thanks in advance!

target ngs alignment bowtie2 • 6.4k views

ADD COMMENT • link updated 5.1 years ago by benformatics 3.9k • written 5.1 years ago by polys ▴ 20

score 3 · Answer 1 · 2019-03-15

The major difference is that for a local alignment you try to match your read to substring of your reference. This is a less conservative approach and essentially assumes that your reads are derived directly from your reference sequence (which they aren't, you have a SNP there).

That is likely the very reason why the score is lower for a local alignment. The alignment is less deal because a SNP is present... so in turn the score is lower. End-to-end alignment are the way to go.

That being said, BWA is perfectly fine. I would trust those results.

Here is the relevant bowtie2 manual entry:

End-to-end alignment score example

A mismatched base at a high-quality position in the read receives a penalty of -6 by default. A length-2 read gap receives a penalty of -11 by default (-5 for the gap open, -3 for the first extension, -3 for the second extension). Thus, in end-to-end alignment mode, if the read is 50 bp long and it matches the reference exactly except for one mismatch at a high-quality position and one length-2 read gap, then the overall score is -(6 + 11) = -17.

The best possible alignment score in end-to-end mode is 0, which happens when there are no differences between the read and the reference.

Local alignment score example

A mismatched base at a high-quality position in the read receives a penalty of -6 by default. A length-2 read gap receives a penalty of -11 by default (-5 for the gap open, -3 for the first extension, -3 for the second extension). A base that matches receives a bonus of +2 be default. Thus, in local alignment mode, if the read is 50 bp long and it matches the reference exactly except for one mismatch at a high-quality position and one length-2 read gap, then the overall score equals the total bonus, 2 * 49, minus the total penalty, 6 + 11, = 81.

The best possible score in local mode equals the match bonus times the length of the read. This happens when there are no differences between the read and the reference.

In your case the difference sounds like your SNP in question might be a gap is considered a gap or something?