SnpSift annotate reporting extra alleles not present in the input VCF
1
0
7.7 years ago
fwuffy ▴ 100

Hello everyone

SnpSift annotate seems to be including a variant allele that is not in my input VCF, when I annotate to clinVar.

For example, here's a row from a test VCF:

1 201364336 rs74315379 G A


G>A is the only call made. G matches the reference assembly.

Yet in the INFO field, we get:

CLNHGVS=NC_000001.11:g.201364336G>A,NC_000001.11:g.201364336G>T;


Why would SnpSift be also reporting the G>T variant? There is only 1 data row in the test file and no mixed allele frequencies or anything else to confound the annotation. It should just be reporting records matching the A variant allele because other alleles are not present in my sample.

Sure hope I'm missing something obvious here, but it appears to be grabbing every possible variant for that position from the VCF I'm annotating to (just those two exist for that position in ClinVar), without regard to the actual variant allele. Is there any way to make SnpSift do what I want here and just annotate to the variant that's in my sample?

Thanks

(Edits for clarity)

SnpEff SNP SnpSift • 2.4k views
0
7.5 years ago
DG 7.3k

All it is adding is the ClinVar info and nomenclature that is available for that SNP ID/position. All it is doing is grabbing the info present in the Clinvar VCF file you are annotating with and adding it to the appropriate info field. My suspicion, which can be born out by going into the ClinVar VCF file and looking, is that it is not split by different reported alleles, hence why you get both in the annotation field. It is just a function of how things are reported in the ClinVar vcf.