Question: understanding of dbSNP info on MAF, Alleles and HGVS Names
gravatar for tonja.r
5.8 years ago by
tonja.r490 wrote:

I have some problems with understanding of the concepts of SNP and related MAF, Alleles and NGVS Names.

I was looking at rs2476601. 
From "Allele" Information I could understand that 
1. reference Allele is A and variation results in G
2. and A is the minor allele and has a frequency of 2.7%

Does it mean that A, our reference allele, is a minor allele and G, our variation, is a major allele? Why is then A called a 'reference allele' if it is a minor allele? Wouldn't it be logical to call a major allele, in our case G, to be a reference allele? Or a reference allele is the allele in the genome version used in mapping etc?

From 'HGVS Names':

  • NC_000001.10:g.114377568A>G
  • NC_000001.11:g.113834946A>G
  • NG_011432.1:g.41808C=
  • NG_011432.1:g.41808C>T
  • NM_001193431.1:c.1858C=
  • NM_001193431.1:c.1858C>T

There are sequences where A is mutated to G, but also sequences where C is mutated to T. I do not understand how it is possible. If we are looking at sense strand at see an 'A' there, which variated somewhere to 'G', it means that on antisense strand we will see a 'T' which variated to 'C'. So, it must be T>C. Why they show C>T then?

Thank you in advance.


snp • 3.0k views
ADD COMMENTlink modified 5.4 years ago by erezts0 • written 5.8 years ago by tonja.r490
gravatar for Emily_Ensembl
5.8 years ago by
Emily_Ensembl21k wrote:

The reference allele is always the allele that is in the reference genome. The reference genome is made up of contigs of sequence, all of which were taken from individuals, so the reference at any given position is that allele that that person happened to have at that locus - at various loci that allele happens to be the minor allele. Indeed, for any given variant, the probability that the reference contains the minor allele is the MAF. Long term, the GRC, who produce the reference genome, are attempting to switch the allele in the reference to whatever the major allele is at that locus, but this is very much in the long term.

G/A vs C/T is down to people reporting the alleles on the opposite strand. Usually this occurs when someone is reporting against a wild-type gene, which will often have the major allele. They will report the variant as major/minor on the reverse strand and submit this to the database, however, as we saw, the ref/alt might actually be minor/major.

ADD COMMENTlink written 5.8 years ago by Emily_Ensembl21k
gravatar for erezts
5.4 years ago by
erezts0 wrote:

Thanks for the question and the aswer. Personally, I still don't understand what is the wild type allele. In other words, what's more prevalent in the population. When I look at GeneView section, I can see that A is on the positive strand. I guess that's the best way to determine the wild type?

G is called "ancestral" . That means wild type?  on Population Diversity section I can see that G is the prevalent allele. "ancestral" allele will always show the prevalent allele?





ADD COMMENTlink written 5.4 years ago by erezts0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1899 users visited in the last hour