create random data with a condition A + B ('outcome') and gene1 ('exposure')

Question

Polygenic Risk Scores: Odds ratio or Beta-coefficient?

6

Entering edit mode

6.8 years ago

Volka ▴ 180

Hi all, I have recently started a project on modelling a polygenic risk score model to evaluate its utilitiy in predicting a certain disease. After doing some reading, I have come across various models for unweighted and weighted Genetic Risk Score models.

I am wondering about the use of odds ratio (OR) versus the Beta-coefficient of each SNP variant in a risk score model. For instance, here they used the Beta-coefficient in their model, while here they used the odds ratio. Is there any difference in using the odds ratio versus the Beta-coefficient in a risk score model? Also, I noticed that some papers use log(OR) rather than ln(OR), is there a major difference between both?

Thanks!

polygenic genetic risk score odds ratio • 22k views

ADD COMMENT • link updated 4.0 years ago by willebaldo.garcia.m ▴ 20 • written 6.8 years ago by Volka ▴ 180

2

Entering edit mode

5.2 years ago

Mike ▴ 30

I'm a student and do research in this area and after a lot of reading, I'm pretty sure you want to use the log odds (Betas) as your weights for your model. The beta is the true weighting even though the OR is more often reported, I believe because it is easier for humans to understand. Also, a lot of the time people say Log(OR) they mean LN(OR). In this field, or generally in bioinformatics, it appears that LN is the default type of logarithmic transformation, so unless you see someone write Log10, they probably mean LN.

Hope this helps!

Mike

ADD COMMENT • link 5.2 years ago by Mike ▴ 30

1

Entering edit mode

Indeed

ADD REPLY • link 5.2 years ago by Kevin Blighe 89k

0

Entering edit mode

So when you said he should use log odds ratio here, you mean the log10 or ln?

ADD REPLY • link 4.5 years ago by zhoufeng2ye ▴ 10

0

Entering edit mode

We mean the original beta coefficients, referred to as the 'Estimate' from any regression model summary table in R.

library('MASS')
data('menarche')
fit <- glm(cbind(with = Menarche, without = Total - Menarche) ~ Age,
  family = binomial(link = 'logit'), data = menarche)
summary(fit)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -21.22639    0.77068  -27.54   <2e-16 ***
Age           1.63197    0.05895   27.68   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Then we can extract the log odds ratio / beta coefficient / estimate and convert to OR:

coef(fit)['Age']
     Age 
1.631968 

OR <- exp(coef(fit)['Age'])
OR
     Age 
5.113931

Then, if you already have the OR and want to convert back to a log odds ratio / beta coefficient/ estimate:

log(OR, base = exp(1))
     Age 
1.631968

ADD REPLY • link 4.5 years ago by Kevin Blighe 89k

0

Entering edit mode

I mean ln, which if you use R, is the default for the log function.

ADD REPLY • link 4.4 years ago by Mike ▴ 30

2

Entering edit mode

4.0 years ago

willebaldo.garcia.m ▴ 20

Also, your parameter of choice, Beta/OR, will affect the units of the PRS on the individuals. The association input dataset and its units will affect both the ranges of the PRS distribution and risk units. The units are also affected by binary or quantitative nature of the studied traits.

https://doi.org/10.1038/s41596-020-0353-1

ADD COMMENT • link 4.0 years ago by willebaldo.garcia.m ▴ 20

score 8 · Accepted Answer · 2018-09-06

Hey,

The odds ratio (OR) is the exponent of the beta coefficient. The beta coefficient itself is the per unit increase/decrease in the exposure. A practical example will explain it better:

create random data with a condition A + B ('outcome') and gene1 ('exposure')

modeling <- data.frame(
    condition=factor(c(rep("A",100), rep("B",100)), levels=c("A", "B")),
    gene1=c(runif(100), runif(100)))
head(modeling)
  condition     gene1
1         A 0.3607443
2         A 0.3268301
3         A 0.4237005
4         A 0.7621534
5         A 0.1456797
6         A 0.3201094

Note that we have set A as the reference level.

create a binomial logistic regression model, with gene1's expression 'predicting' the outcome

model <- glm(condition ~ gene1, data=modeling, family=binomial(link='logit'))
summary(model)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.03767    0.30014  -0.126    0.900
gene1        0.07294    0.51258   0.142    0.887

Here, the beta coefficient for gene1 in relation to condition B versus A is 0.07294. So, gene1 increases in expression in condition B (if it decreased, the beta coefficient would be negative). This is not a statistically significant finding, though, with p=0.887.

We can also test the gene1 via the Wald test on the beta coefficient:

require(aod)
wald.test(b=coef(model), Sigma=vcov(model), Terms=2)
Wald test:
----------
Chi-squared test:
X2 = 0.02, df = 1, P(> X2) = 0.89

obtain the odds ratio and upper / lower confidence intervals (CIs):

exp(cbind(OR=coef(model), confint(model, level = 0.95)))
                   OR     2.5 %   97.5 %
(Intercept) 0.9630288 0.5333548 1.737047
gene1       1.0756700 0.3930383 2.949061

So, odds ratio is just 1.1, which, as you can tell, is not huge and only reflects a slight increase.

log OR

The log OR is just the natural logarithm of the OR. With regard to why we may even want to use log OR over OR, well, there are probably many reasons. One is that we can calculate the Z score from the log OR:

OR <- 1.0756700
lowerCI <- 0.3930383
upperCI <- 2.949061

logOR <- log(OR)
logORlowerCI <- log(lowerCI)
logORSE <- (logOR - logORlowerCI) / 1.96

Then calculate Z:

logOR / logORSE
[1] 0.1420052

Kevin