According to the VCF spec indel variants need a leading non polymorph nucleotide for all alleles.
For simple insertions and deletions in which either the REF or one of the ALT alleles would otherwise be null/empty, the REF and ALT Strings must include the base before the event (which must be reflected in the POS field), unless the event occurs at position 1 on the contig in which case it must include the base after the event;
Why is this? Other systems accept or require variants without the leading base.
VCF: CAGTAGTGA/C Other: AGTAGTGA/-
VCF: C/CAGTAGTGA Other: -/AGTAGTGA
The starting positions of the variants of course also differs by 1 between these two notation forms.
Is one notation form better than the other?
Is lossless conversion always possible between these two annotation forms?
I.e. just add or remove the first leading base and increment of decrease POS by 1? Is there a script/tool/code snippet that already does this?