Question

understanding of dbSNP info on MAF, Alleles and HGVS Names

2

Entering edit mode

9.2 years ago

tonja.r ▴ 600

I have some problems with understanding of the concepts of SNP and related MAF, Alleles and NGVS Names.

I was looking at rs2476601. From "Allele" Information I could understand that

Reference Allele is A and variation results in G; and
A is the minor allele and has a frequency of 2.7%

Does it mean that A, our reference allele, is a minor allele and G, our variation, is a major allele? Why is then A called a 'reference allele' if it is a minor allele? Wouldn't it be logical to call a major allele, in our case G, to be a reference allele? Or a reference allele is the allele in the genome version used in mapping etc?

From 'HGVS Names':

NC_000001.10:g.114377568A>G
NC_000001.11:g.113834946A>G
NG_011432.1:g.41808C=
NG_011432.1:g.41808C>T
NM_001193431.1:c.1858C=
NM_001193431.1:c.1858C>T

There are sequences where A is mutated to G, but also sequences where C is mutated to T. I do not understand how it is possible. If we are looking at sense strand at see an 'A' there, which changed somewhere to 'G', it means that on antisense strand we will see a 'T' which changed to 'C'. So, it must be T>C. Why they show C>T then?

Thank you in advance.

SNP • 3.7k views

ADD COMMENT • link updated 24 months ago by Ram 43k • written 9.2 years ago by tonja.r ▴ 600

Ram · Accepted Answer · 2015-02-25

The reference allele is always the allele that is in the reference genome. The reference genome is made up of contigs of sequence, all of which were taken from individuals, so the reference at any given position is that allele that that person happened to have at that locus - at various loci that allele happens to be the minor allele. Indeed, for any given variant, the probability that the reference contains the minor allele is the MAF. Long term, the GRC, who produce the reference genome, are attempting to switch the allele in the reference to whatever the major allele is at that locus, but this is very much in the long term.

G/A vs C/T is down to people reporting the alleles on the opposite strand. Usually this occurs when someone is reporting against a wild-type gene, which will often have the major allele. They will report the variant as major/minor on the reverse strand and submit this to the database, however, as we saw, the ref/alt might actually be minor/major.