Question: Definition of minor and major allele and connection with risk, effect, wildtype and reference allele
gravatar for m93
13 months ago by
m93150 wrote:

In the context of genotype data and/or NGS datasets, could someone provide a clear definition and differences between minor, major, risk, reference, wildtype and effect alleles? I find terms are very often interchanged without a clear definition in various software.

My current understand is as follows:

  • major allele: the most common allele for a given SNP

  • minor allele: the less common allele for a SNP. The MAF is therefore the minor allele frequence. This measure can be used to get a rough idea of the variation of genotypes for a given SNP in a given population, in other words it tells you how common this SNP is.

  • risk allele: in the context of a disease, this is the allele that confers a risk of developing the disease. Most of the time, risk allele = minor allele, as most people will not carry the risk allele. However, in some case, the risk allele can in fact be the major allele.

  • effect allele: ??

  • reference allele: ?? Is this the major allele, i.e the most common allele?

  • wildtype allele: ?? Is this the same as the reference allele?

Apologies if this is a really basic question, but I feel that after encountering all the various terms in different places, I am quite confused and in need a precise definitions. Many thanks.

major risk reference minor alleles • 5.4k views
ADD COMMENTlink modified 7 weeks ago by alhamidi.reem0 • written 13 months ago by m93150

Thank you Kevin for your explanations :)

ADD REPLYlink written 7 weeks ago by alhamidi.reem0

You are welcome.

ADD REPLYlink written 7 weeks ago by Kevin Blighe42k
gravatar for Kevin Blighe
13 months ago by
Kevin Blighe42k
Guy's Hospital, London
Kevin Blighe42k wrote:

major allele

"the most common allele for a given SNP"... in the cohort in question. The cohort may be just 10 people, though, or it could be 2,504 like in 1000 Genomes Phase III. In addition, the major allele, by definition, could have a frequency of 50.5%, in which case, although it is more frequent, it is only more frequent by 0.5%. The point that I want to make is that the major allele only makes sense when you understand the cohort in which it is the major allele, and also the size of that cohort.

minor allele

As above but, yes, the reverse, in that it is the less frequent allele. Also, yes, the MAF is the frequency of the minor allele and, from the MAF, one can infer the frequency of the major allele if it is a bi-allelic site (some sites understandably are tri- or quad-allelic).

On what you said about the "variation of genotypes", if a site has a very low MAF in a global cohort (i,.e. samples from various parts of the World), it may imply that the major allele is conserved and is 'fixed' in the human genome, but not necessarily. A very rare allele at such a site may, thus, be under selective pressure if it reflects positive gain of function, or it could be deleterious and more likely to be eliminated from the human lineage.

risk allele

What you said is correct. The risk allele is statistically significantly associated with risk of having a disease under study. Such an allele should have genome-wide significance and have an odds ratio > 1.0. A situation in which a major allele may be seen as the 'risk allele' is where the minor allele is found to be protective against disease by having an odds ratio < 1.0, coupled with a statistically significant p-value. However, such a situation is not usually interpreted from the context of the major allele being the risk allele.

You may have been thinking about rare and common (MAF>5%) variants. For example, it is accepted (by those who actually think) that common alleles have roles in disease. An example are the variants in the CCND1 locus, which have MAFs of ~15% in Caucasians but which confer increased risk of ER+ breast cancer. Look at Rare and common variants: twenty arguments. for further reading.

I should add that many rare variants may be functionless, but that they can still accumulate in the human genome and eventually become functional if combined with other nearby variants. For example, variants accumulated over time eventually form novel TSS sites, TF binding sites, histone binding sites, protein binding sites, etc.


In relation to the above 3, you may enjoy reading a recent answer that I gave: A: SNP dataset and Z Score

effect allele

This isn't used that much. It is essentially the allele whose effects in relation to disease are being studied. The effect allele is therefore, invariably, the minor allele.

reference allele

If you hear this term, exercise caution. The best way to view it is as the allele that is in a particular reference build, e.g., GRCh37 / hg19, GRCh38 / hg38, etc. In some cases, however, the reference allele can be a risk allele. Read here for further information: A: Alternate nucleotide is more frequent than reference nucleotide. OMG I'm dizzy.

wildtype allele

Not the same as the reference allele. A wildtype allele is specific to your case-control study and is merely the allele that is present in your wild-type samples. This could feasibly be a minor allele, or anything else - it's specific to your study and what you view as the wild-type condition.

Thank you.


ADD COMMENTlink modified 7 weeks ago • written 13 months ago by Kevin Blighe42k

Thank you so much! I should have added "ancestral" allele in my question. Am I correct in saying the ancestral allele is the major allele, given a large reference population?

ADD REPLYlink written 13 months ago by m93150

Yes, the ancestral allele would be the major allele. Again, however, due to the 'quirks' of the reference genome builds, the ancestral allele is not always the allele that appears in the reference genome. The reference genome has many thousands of rare alleles, at least in the case of hg19.

Edit: note that 'ancestral' will be interpreted differently depending on who you talk to. Here is another definition that more or less refers to the major allele:

A situation could arise, though, where a rare allele could confer gain of function and, therefore, it would eventually become more frequent than the ancestral allele. This is obviously over many many generations, though, and is more in the realm of evolution.

ADD REPLYlink modified 13 months ago • written 13 months ago by Kevin Blighe42k

Thank you so much for clarifying all this for me!

ADD REPLYlink written 13 months ago by m93150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1055 users visited in the last hour