Can variant callers distinguish between a snp and the end of an indel?
1
0
Entering edit mode
13 months ago
curious ▴ 600

From 1000 genomes vcf:

1       191160243       rs68092106      TC      T       100     PASS    AC=1393;AF=0.278155
1       191160244       rs10801031      C       T       100     PASS    AC=1022;AF=0.204073


Seems a little suspicious that these have the same freq. Are these two rows reporting the same variant? Or can callers really distinguish between an indel and a SNP that perfectly overlaps the beginning or end of an indel?

TOPMed reports the similar results to 1000 genomes

https://bravo.sph.umich.edu/freeze5/hg38/variant/1-191191114-C-T

https://bravo.sph.umich.edu/freeze5/hg38/variant/1-191191113-TC-T

So I would guess they really are two different variants, unless this is a known issue with variant calling. I mean as long as it is sequence based I would think they could right?

1000 genomes vcf variant calling • 382 views
1
Entering edit mode
13 months ago
Ram 34k

Seems a little suspicious that these have the same freq

They don't, and even if they did, it could be coincidence.

Are these two rows reporting the same variant?

No - one is a deletion (not an indel) and the other is an SNV. gnomAD clearly distinguishes between the two:

0
Entering edit mode

Thanks, that makes sense

0
Entering edit mode

Actually this raises raises another question, I came across this example:

1       37176590        rs138644175     AC      A       100     PASS AC=3143;AF=0.627596;
1       37176591        rs12723973      C       A       100     PASS    AC=2851;AF=0.569289;


It seems the deletion would occur in ~ 62 % of haplotypes, removing the C at position. It seems the SNV occurs in ~ 59% of haplotypes.

Considering the context: https://www.ncbi.nlm.nih.gov/genome/gdv/browser/genome/?id=GCF_000001405.25, I am not sure how you can have the described SNV ~59% of the time if the base it depends on is deleted ~62% of the time. Sorry feel a bit silly about this one.

1
Entering edit mode

They don't have to be part of the same 100% - the context is in the number of samples/chromosomes, not the total number in the cohort. The deletion could be 62% among 200 chromosomes and the 57% could be among a totally different or slightly overlapping 300 chromosomes. You'd need to look at this in each individual to see if there are any changes that don't make sense, like a hom-alt deletion AND a het SNV in the same diploid person.

Not to mention that gnomAD has wildly different frequencies on them:

0
Entering edit mode

like a hom-alt deletion AND a het SNV in the same diploid person.

This really helped it click, thank you again for taking the time to respond to this question and all the others you have contributed to on biostars.

1
Entering edit mode

No problem, happy to help!