I am digging in the deepest of variant calling this year, and I stepped on a weird case on my VCF file. INDEL lines usually report reference position and reference allele indicating the nucleotide which is before the indel.
Example: If I have an AAG insertion at position 5 of my scaffold, I will get reported a VCF line like:
Chrom Pos Tag Ref Alt Scaffold 4 . G GAAG ...(etc)
What happens in my file is:
Chrom Pos Tag Ref Alt Scaffold 4 . TAG TAGAAG ...(etc)
Not that in the second case the "TA" before the "G" are also included. I checked and these bases are part of the reference and part of 95% of the reads that map there, same reads that call the subsequent indel.
What is happening? Why is bcftools call reporting also those ones into the reference and alternative allele of the indel?