While working to get this issue fixed in VarScan, I'm attempting to generate (or rather correct from the original output) a VCF record for two samples, each with a different indel at the same position.
To make it simple, the situation is:
- First reference base: C
- Indel in sample 1: CAA -> C (loss of 2 bases)
- Indel in sample 2: CA -> C (loss of 1 base)
I know from the data that this is likely an artifact (low coverage region) but still I need to generate a proper record for it or my analysis pipeline will not work (the GATK will complain about an invalid record, see the last post in the link for more details).
How would I go to represent this in a VCF? In particular, how should I represent the REF and ALT records? Should I split this in two records, or keep everything in one?