Dear community members,
I do not work with SNVs directly and I can not check this myself. But I use allele frequencies of SNVs for CNAs calling, so this question is quite important for me too.
Saying "allele frequency" I am meaning the ratio between number of reads supporting alternative allele vs overall read depth coverage of the position.
It is well known that allele frequencies of SNVs have a bias towards reference. Usually, allele frequencies of heterozygous SNVs are equal not to 0.5 (half of reads support reference, half - alternative), but to 0.47-0.48. Previously it was explained by the mapping bias and I was satisfied with this explanation (reads with non-reference mutations are more difficult to align).
A different explanation is proposed. There is such bias due to soft-clipping and partially aligned reads (better go to page 15 of the paper for the graphical explanation). The allele frequency becomes equal to exactly 0.5 after the correction.
My question is: what are the drawbacks of usage of this tool and this correction routinely (for all VCFs - correct the allele fraction)? Will something change for SNV calling? Can it be done without this tool (just using other parameters for the aligner, e.g., bwa-mem)? Is it unique for Isak aligner or bwa-mem suffers from this too? I am absolutely sure it will improve my CNA calling, but what is important - can I insist on this correction for all the parts of our pipeline or do I need to perform it additionally for CNA calling only. I see a slight decrease of coverage after this procedure - what do you think, will it be a significant drawback?
As I explained in the beginning, I do not work with SNVs and none of my colleagues who work were interested in discussing this topic. To check if something changes I'd suggest to cut all the reads 5-10 bp from the start and the end after the alignment, but I can not do it myself (at least, fast) - I will have to learn SNVs calling from scratch.
Thanks if smb is interested.