UPDATE: I'll leave this post up since I got a really thorough response from the tool's developer himself. This response might be of a great value later on for someone else. Thank you jkbonfield!
Hey fellow bioinformaticians!
At this point I'm really confused regarding the
-B option with the
First to clarify what this option does, it "disables base alignment quality (BAQ) computation"
Now, in the documentation page of samtools mpileup:
BAQ is the Phred-scaled probability of a read base being misaligned. It greatly helps to reduce false SNPs caused by misalignments. BAQ is calculated using the probabilistic realignment method described in the paper “Improving SNP discovery by base alignment quality”
BUT, in the documentation page of bcftools mpileup, they say the exact opposite regarding BAQ:
-B, --no-BAQDisable probabilistic realignment for the computation of base alignment quality (BAQ). BAQ is the Phred-scaled probability of a read base being misaligned. Applying this option greatly helps to reduce false SNPs caused by misalignments.
So at this point I'm really confused. Samtools documentation says that computing BAQ helps reduce false SNPs discovery, while the bcftools documentation says that disabling BAQ improves SNPs discovery.
Also, I've tried
bcftools mpileup on the same set of data, once without
-B and once with it. I got significantly different results: without this option, I got some INDELs in some samples, while with the
-B argument, I got no INDELs at all in any of my 42 COVID-19 samples.
Am I missing something? Did I misunderstand the developers' wording?
EDIT: in the
ivar manual, the use of
samtools mpileup is recommended, but for a different reason:
Please use the
samtools mpileupto call variants and generate consensus. When a reference sequence is supplied, the quality of the reference base is reduced to 0 (ASCII: !) in the
mpileupoutput. Disabling BAQ with
-Bseems to fix this. This was tested in
samtools1.7 and 1.8
I don't know what to understand from all of this.