Question

What are the different ways of referring to a genetic variant/ mutation?

0

Entering edit mode

8.7 years ago

dk ▴ 10

There are different genetic variant types such as Indels, SNPs, insertions/deletions etc.

In ClinVar database, these variants are given a long name. Examples - NM_172201.1(KCNE2):c.79C>T (p.Arg27Cys). But in other instances like in publications and clinical reports, this full name is not used to refer to that variant. Can I take the gene name (KCNE2) and genetic code change (c.79C>T) together? What are the other possible ways to refer to a variant? Can I take the position (p.Arg27Cys) as well? I want this to extract information about genetic variants from publications and clinical reports.

gene next-gen SNP variants sequence • 2.1k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.7 years ago by dk ▴ 10

score 1 · Answer 1 · 2015-08-24

While it's convenient to keep track of the gene name, the transcript ID (e.g., NM_172201.1) is actually needed, since each gene may have multiple transcripts associated with it. The protein-level change is also nice to keep track of, though make sure you keep the associated protein ID (NP_751951.1 in this example), since otherwise you won't always know which isoform is changing.

The actual literature is a total mess in this regard. People are supposed to follow the nomenclature guidelines, but they don't always and the guidelines themselves are not always particularly clear. So, this will be a somewhat painful process, make sure to manually spot-check some of the entries!

Ram · Answer 2 · 2015-08-24

The first notation you got from Clinvar is the recommended HGVS nomenclature and the most reliable, responsible way to report mutations in literature. You should ideally have the CDS position, allele change, transcript name and version to reliably get the genomic coordinates of your mutation.

Quite often, especially in older publications, this will not be the case, in such cases, if you want to be thorough, you could check the mutation impact at the CDS position and allele change in all transcripts of the gene in Refseq to see which functional consequence best matches the variant being reported.

Getting the genomic coordinates from just the protein change is even harder, I usually reference a database like dbNSFP https://sites.google.com/site/jpopgen/dbNSFP which has prebuilt annotations for all possible non-synonymous mutations in the human genome to see if a protein change within a gene matches the mutation signature.