understanding of dbSNP info on MAF, Alleles and HGVS Names
1
2
Entering edit mode
9.2 years ago
tonja.r ▴ 600

I have some problems with understanding of the concepts of SNP and related MAF, Alleles and NGVS Names.

I was looking at rs2476601. From "Allele" Information I could understand that

  1. Reference Allele is A and variation results in G; and
  2. A is the minor allele and has a frequency of 2.7%

Does it mean that A, our reference allele, is a minor allele and G, our variation, is a major allele? Why is then A called a 'reference allele' if it is a minor allele? Wouldn't it be logical to call a major allele, in our case G, to be a reference allele? Or a reference allele is the allele in the genome version used in mapping etc?

From 'HGVS Names':

  • NC_000001.10:g.114377568A>G
  • NC_000001.11:g.113834946A>G
  • NG_011432.1:g.41808C=
  • NG_011432.1:g.41808C>T
  • NM_001193431.1:c.1858C=
  • NM_001193431.1:c.1858C>T

There are sequences where A is mutated to G, but also sequences where C is mutated to T. I do not understand how it is possible. If we are looking at sense strand at see an 'A' there, which changed somewhere to 'G', it means that on antisense strand we will see a 'T' which changed to 'C'. So, it must be T>C. Why they show C>T then?

Thank you in advance.

SNP • 3.7k views
ADD COMMENT
3
Entering edit mode
9.2 years ago
Emily 23k

The reference allele is always the allele that is in the reference genome. The reference genome is made up of contigs of sequence, all of which were taken from individuals, so the reference at any given position is that allele that that person happened to have at that locus - at various loci that allele happens to be the minor allele. Indeed, for any given variant, the probability that the reference contains the minor allele is the MAF. Long term, the GRC, who produce the reference genome, are attempting to switch the allele in the reference to whatever the major allele is at that locus, but this is very much in the long term.

G/A vs C/T is down to people reporting the alleles on the opposite strand. Usually this occurs when someone is reporting against a wild-type gene, which will often have the major allele. They will report the variant as major/minor on the reverse strand and submit this to the database, however, as we saw, the ref/alt might actually be minor/major.

ADD COMMENT
0
Entering edit mode

Thanks for the question and the answer. Personally, I still don't understand what is the wild type allele. In other words, what's more prevalent in the population. When I look at GeneView section, I can see that A is on the positive strand. I guess that's the best way to determine the wild type?

G is called "ancestral" . That means wild type? on Population Diversity section I can see that G is the prevalent allele. "ancestral" allele will always show the prevalent allele?

Thanks!

Erez

ADD REPLY

Login before adding your answer.

Traffic: 2620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6