Indel Notation In Variant Calling
11.0 years ago
Andrea_Bio ★ 2.7k

Hello

I am sorry for the basic question but I am struggling to find any details of the nomenclature of indels by variant calling software. Unfortunately i am unable to access the details of the software used for the variants at this moment but I imagine the nomenclature will be fairly standard. I tried looking in a bioscope guide but this wasn't explained so I assume it is so obvious to those in the field that it does not need explaining. However i'm not from the field and haven't worked with indels before, only snps.

Ref     Genotype
C       */-G
T       +C/*


What does the star mean here and why is it sometimes before the / or after the /

also what does it mean when the reference allele is a * or a N

thanks a lot

Which package is emitting these calls? That doesn't conform to the VCF 4.0 format as I understand it (http://www.1000genomes.org/node/101)

i think its bioscope but i can't get hold of the data provider at present hence my problem :(

11.0 years ago
Drio ▴ 920

It indicates you have a deletion (first) or insertion (second) with respect your reference genome. The '*' indicates one of the genotypes matches your reference genome. A good way to understand and confirm all these is to look at the alignments by eye (check broads' igv or my favorite samtools tview).

thanks for your answer. i don't have any alignments to check by eye. just this data. why is the * sometimes before or after the /? If * means one of the genotypes matches the reference, why don't they include the reference allele in the genotype instead. what does it mean when the reference allele is * or N

I don't think the order is relevant (check other fields). If there is an N it means the reference genome did not have any nucleotide at that position. An * in the reference would indicate there is an homozygous insertion with respect the reference.

ok so something like this ref genotype
/-TT means both alleles had a TT deletion and this +C/ means both alleles had a C inserted (order doesn't matter). What notation is this?

sorry one more quick thing, what does it mean when a snp has * for the reference e.g. (ref genotype) * T or * Y

if * for the reference means a homozygous insertion, then i don't understand this ref/genotype which is a deletion * -CCCC/-CCCCC. also what woudl this ref/genotype * */-C mean

Take this genotype: T -ACTC/ where (T) is the reference and (-ACTC/) is the genotype. My best guess is the ref = T and there were 2 alleles observed, one the same as the ref, , and one a deletion. Take this genotype: * -ACTC/ where * is the ref and -ACTC/ is the genotype. Based on your information this genotype should be a homozygous deletion wrt the reference and have one allele the same as the reference. Those 2 conditions are mutually exclusive