Indel Notation In Variant Calling
1
1
Entering edit mode
11.0 years ago
Andrea_Bio ★ 2.7k

Hello

I am sorry for the basic question but I am struggling to find any details of the nomenclature of indels by variant calling software. Unfortunately i am unable to access the details of the software used for the variants at this moment but I imagine the nomenclature will be fairly standard. I tried looking in a bioscope guide but this wasn't explained so I assume it is so obvious to those in the field that it does not need explaining. However i'm not from the field and haven't worked with indels before, only snps.

Ref     Genotype
C       */-G
T       +C/*


What does the star mean here and why is it sometimes before the / or after the /

also what does it mean when the reference allele is a * or a N

thanks a lot

indel • 3.5k views
0
Entering edit mode

Which package is emitting these calls? That doesn't conform to the VCF 4.0 format as I understand it (http://www.1000genomes.org/node/101)

0
Entering edit mode

i think its bioscope but i can't get hold of the data provider at present hence my problem :(

4
Entering edit mode
11.0 years ago
Drio ▴ 920

It indicates you have a deletion (first) or insertion (second) with respect your reference genome. The '*' indicates one of the genotypes matches your reference genome. A good way to understand and confirm all these is to look at the alignments by eye (check broads' igv or my favorite samtools tview).

0
Entering edit mode

thanks for your answer. i don't have any alignments to check by eye. just this data. why is the * sometimes before or after the /? If * means one of the genotypes matches the reference, why don't they include the reference allele in the genotype instead. what does it mean when the reference allele is * or N

0
Entering edit mode

thanks for answer. i dont have any alignments to check otherwise i could have worked it out from them :) I just have this data sadly. why is the * sometimes before or after the slash? Does the order mean anything? what does it mean when the reference is a * or an N?

0
Entering edit mode

I don't think the order is relevant (check other fields). If there is an N it means the reference genome did not have any nucleotide at that position. An * in the reference would indicate there is an homozygous insertion with respect the reference.

0
Entering edit mode

ok so something like this ref genotype
/-TT means both alleles had a TT deletion and this +C/ means both alleles had a C inserted (order doesn't matter). What notation is this?

0
Entering edit mode

ok so something like this (ref genotype) * /-TT means both alleles had a TT deletion and this * +C/ means both alleles had a C inserted (order doesn't matter). What notation is this and what package is it created by?

0
Entering edit mode

sorry one more quick thing, what does it mean when a snp has * for the reference e.g. (ref genotype) * T or * Y

0
Entering edit mode

if * for the reference means a homozygous insertion, then i don't understand this ref/genotype which is a deletion * -CCCC/-CCCCC. also what woudl this ref/genotype * */-C mean

0
Entering edit mode

Take this genotype: T -ACTC/ where (T) is the reference and (-ACTC/) is the genotype. My best guess is the ref = T and there were 2 alleles observed, one the same as the ref, , and one a deletion. Take this genotype: * -ACTC/ where * is the ref and -ACTC/ is the genotype. Based on your information this genotype should be a homozygous deletion wrt the reference and have one allele the same as the reference. Those 2 conditions are mutually exclusive