Question: Getting Risk Allele from GWAS Odds Ratio?
gravatar for Mike Dacre
3.3 years ago by
Mike Dacre110
Stanford, CA
Mike Dacre110 wrote:

I understand the basic principle of how to calculate an odds ratio in a GWAS, but I am trying to figure out how it is actually done in most large-scale GWAS, to see if it is possible to go back from this to a risk allele.

I am looking through a large number of GWA studies, and most of them give a p-value and odds ratio for associated SNPs, but very, very few give the risk allele. Is there any way to go back from an odds ratio or likelihood ratio to a risk allele? Is it standard practice to always calculate the minor allele as allele 1 and the major allele as allele 2 in GWAS? From the few studies I have found that give the risk allele as well as the OR, that does not seem to be the case (I wish it were).


statistics gwas • 3.4k views
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Mike Dacre110

Interested in responses on this. I'd think fundamentally there is no way to reliably infer risk allele given only p/odds ratio for a SNP. I think it's a common, but not certainly not universal, convention that the minor allele is used as effect allele in common software used to compute associations.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Ahill1.8k

Yes, unfortunately I think that you are right. If the authors do not explicitly include their method for choosing the effect allele in the OR calculation, there is no way to be sure. It is just strange to me that the risk allele would not be included in the results... I had hoped that this was because there was some obvious way of figuring that out from the other data, but that does not seem to be the case.

ADD REPLYlink written 3.3 years ago by Mike Dacre110
gravatar for Mike Dacre
3.3 years ago by
Mike Dacre110
Stanford, CA
Mike Dacre110 wrote:

Based on my work over the last few days I can say that in many cases it just isn't possible to use data from published GWAS to find the risk allele, many studies just do not publish enough information by either skipping allele information altogether and just publishing p-values, or by publishing an odds-ratio or beta without stating how it was calculated.

However, there are a few tricks/heuristics that can be used to get the data:

  • Sometimes the way the OR was calculated (e.g. with respect to the minor allele) is given in the methods or table description. The allele they identify as the coded allele is then allele1 for the OR calculation and thus the risk allele is the coded allele if the OR is greater than 1.
  • A large number of studies provide an allele1, allele2, and OR. In these cases it is almost always the case that the OR was calculated with respect to allele1, and thus the risk allele is allele1 if the OR is greater than 1. However, as the authors give no explicit information in this case, you need to treat this data with a large grain of salt, you could easily have it backwards.
  • Many studies give the MAF for both cases and controls along with the minor allele, sometimes with an OR as well. In this case the risk allele is the minor allele if the MAF is higher in the the case vs control. When studies have both the MAFs and the OR, in my experience of a few dozen studies, the MAF calculation and OR direction always match, which gives me more confidence in the last method I mentioned (bullet 2).
  • Sometimes studies do not provide the minor allele, but just provide the MAFs. In these cases you can still get the risk allele in cases where you can query the SNP on dbSNP. The key is that the population studied should be the same as the population data in dbSNP, and dbSNP should have only 1 alternate allele for that SNP. Sometimes there are multiple alternate alleles, in this case you can never have confidence that you have the right minor allele. In the cases where there is obviously one major and one minor allele for your population though, you can use that to pick the risk allele using the technique mentioned above.

Using these methods I am able to get the risk allele for around 80% of the studies that I have searched, which isn't bad. I think only around 10% actually explicitly state what the risk/coded/effect allele is, which is kind of mindblowing to me.

ADD COMMENTlink written 3.3 years ago by Mike Dacre110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2147 users visited in the last hour