I'm looking for help understanding the example of spanning alleles and multi-allelic loci on https://luntergroup.github.io/octopus/docs/guides/advanced/vcf/
The ALT field values are OK, but the GT values don’t make sense to me.
BAM files in IGV-style display show 3 samples.
1st, HG002, has a 4-bp del starting at 728 in half the reads, half are REF (729-732 deleted)
2nd, HG003, has an 8-bp del starting at 732 (733-740 deleted) in half reads, other half are REF
3rd, HG004, has both events, in different reads, i.e. anti-phased. Resulting coverage looks like 12-bp loss in ½ reads, but actually no reads are REF
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG002 HG003 HG004
chr4 19232687 . A G 50 PASS AC=2;AN=6 GT:PS 0|1:19232687 0|0:19232687 0|1:19232687
Upstream single-nt change defines phasegroup
chr4 19232728 . ATCTG A 50 PASS AC=2;AN=6 GT:PS:PQ 0|1:19232687 0|0:19232687 0|1:19232687
GT=1 is heterozygous 4-bp deletion in 1 and 3 - OK
chr4 19232732 . GTCTGTCTATCTA G,*,* 50 PASS AC=2,2,2;AN=6 GT:PS 2|3:19232687 1|2:19232687 1|3:19232687
GT=1 is 12-bp deletion. GT 2 and 3 are overlapping events defined in other records. Why is 1st sample 2|3 instead of 0|2? Why is 2nd sample not 0|3? No haplotype has a 12-nt deletion, so how is haplotype-aware calling working here?
chr4 19232736 . G GTCTATCTA,* 50 PASS AC=2,2;AN=6 GT:PS 1|0:19232687 2|19232687 2|0:19232687
GT=1 is an 8-bp insertion. This is in graphic as the purple vertical lines at 736, which overlaps the deletion starting at 732 and is in 1st and 2nd samples. GT for HG002 makes sense to me; HG003 has a copying error? I think should be 2|1; HG004 has overlapping del \ =2 and REF=0, which also makes sense.*
Please can I have more explanation for the G,*,* call?