Question

How to calculate the increase in qualitative disease score per risk allele?

1

Entering edit mode

9.2 years ago

analyticalavailable ▴ 30

Hi,

I'd like to calculate the increase in qualitative disease score per risk allele. I've a cohort of 1000, and a single SNP, for each of the individuals, I've a score from 0 to 100. It was recommended that I use a linear regression model.

Can anyone elaborate on why you might use a linear regression model for this? Are there any other models that would fit this task? Suggestions on R packages I could use, are welcome.

Thanks

SNP R • 2.7k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by analyticalavailable ▴ 30

Ram · Accepted Answer · 2015-02-04

2

Entering edit mode

9.2 years ago

Devon Ryan 104k

The versions of linear regression (well, a linear model really) that you're familiar with are ANOVAs and T-tests. This is essentially an ANOVA, possibly with a partial factorial design (depending on how you want to set things up).

BTW, the alternative to this would be to divide the scores by 100 and use logistic regression. This would be to get around some ceiling and floor effects in more complex models. Of course, if you only have essentially two alleles then a Wilcoxon test (aka Mann-Whitney U-test) could work. But that only makes sense if you just have two groups. That's essentially just a non-parametric T-test.

Anyway, graph the data and have a look. Then fit it with lm() in R and plot that too to see if the fit seems reasonable.

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Devon Ryan 104k

1

Entering edit mode

OK, I'm kind of new to this area so I am coming in a little blind. I've the score that varies between 0-100, for 1000 people. For each of those, I've the GG (most common homozygote) as 0, GA as 1 and AA (least common homozygote). What I am gathering from online is that I need to break each genotype down into it's own covariant in the model that I use, in order to get the differences between each group in terms of the score. So, I figured that I would look into a multiple linear regression model in SPSS, for which there is say a column for the score, then for each of the genotypes. The score being the dependent variable, and the genotypes being the three explanatory variables. Firstly, for example, there will be a different number of genotype values for each genotype, GG n=500, GA n=350 and AA n=150. If I would to set the score as my dependent variable n=1000, how does that work? Since the score wouldn't correspond across the row. This differing from a more standard model, where you would say have earnings as a dependent variable (n=1000), education level (n=1000) and experience (n=1000) as your explanatory variables. Do you get what I mean?

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by analyticalavailable ▴ 30

2

Entering edit mode

I'll given an example using R, since I don't use SPSS. There are actually two ways to go about this:

d <- data.frame(genotype=factor(c(rep("GG",500),rep("GA", 350),rep("AA",150)), levels=c("GG","GA","AA")), val=c(rnorm(500, 20, 20), rnorm(350, 40,20), rnorm(150,50,20)))
d$val[d$val<0] = 0
d$val[d$val>100] = 100
summary(lm(val~genotype, d))

The other way is to make this an additive model so you see estimate a homozygous A interaction (i.e., whether there's a simple additive effect of the genotype or if two copies of A produce a non-linear effect).

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Devon Ryan 104k

0

Entering edit mode

I actually prefer R, so thank you very much for this. I believe that it is the additive model that I am looking for right now, since I would like to see what the interaction with the homozygous A.

Would you be able to suggest the function/library that is used to do that kind of analysis?

Data is currently organised something like the following:

Col1      Col2
GA        27.0
GA        57.0
GG        87.0
AA        15.0

I'll play around with the R implementation and try to get that working, but I am new to it.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by analyticalavailable ▴ 30

0

Entering edit mode

Running both the lm and gam, I find that I am not getting the results that I need from the summary/anova.

(per G allele IRR 0.89, 95% confidence interval [95% CI] 0.82, 0.97; PLR ? 0.002)

So, in my case I should be seeing a percentage lower risk of the disease for AA over GA and GG.

So in the above case, GA individuals have a 11% lower risk than those of AA.

Any idea on where I should look for that information?

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by analyticalavailable ▴ 30

0

Entering edit mode

Not off-hand, no.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Devon Ryan 104k