Question: Regression models on genetic data
0
melania 2282100 wrote:

Hello,

I am using unconditional logistic regression to modelise genetic effect and genetic*environment exposure effect on my outcome.

My results a bit strange :

When modeling only main variants effect, I have no SNP associated

When modeling with interaction term exposure:SNP with additive term , I have a strong significant signal only for additive term ( towo SNP with p<<10e-8) and nothing for interaction.

I am using 0,1 and 2 codes for SNP (effect allele) and a continuous exposure variable.

I am working on case control study ( 2300 subjects) and testing 7000 SNPs

Can this be a reliable result ? How could this be explained ?

Thank you very much !

snp R • 327 views
modified 17 months ago by Lemire600 • written 17 months ago by melania 2282100
2

What, precisely, is your model formula? - `outcome ~ exposure:SNP + SNP`

Working with regression models can be difficult (and 'risky') - basically, it is possible to find a statistically significant p-value by messing around with the model formula; however, the models may be meaningless. Without also looking at the standard errors, the beta coefficients, and odds ratios, one cannot really make any interpretation based solely on the p-value. Also, should you be adjusting for population stratification?

Thank you Kevin, I am adjusting on PCA and this a result of metaanlysis of two different studies. my model is outcome ~ exposure:SNP + SNP+ exposure+ other cofactors than I did the metaanlysis from wich I get the significant result

Ah, a model formula like this:

``````outcome ~ exposure:SNP + SNP+ exposure
``````

...is the same as:

``````outcome ~ exposure * SNP
``````

, i.e., it is a multiplicative model, also sometimes called the 'log-additive model'. Perhaps this may assist in the interpretation? As an example, I conducted a similar study in 2016 (but it was conditional regression with `Family ID` as the matched strata) and I also used a multiplicative model. How are the standard errors?

Lemire has provided an answer, below.

2
Lemire600 wrote:

In your regression equation, you have the following terms:

beta_s * SNP + beta_i * SNP * exposure (ignoring the other ones you may have)

The estimate for beta_s (from which you derived your significance) is the slope of the effect of the SNP on your outcome when the exposure variable is equal to 0. That's how you need to interpret your result. The only thing you can say from your output is that your SNP has a significant effect when the exposure is 0. If your exposure would be equal to, e.g., 2, then the effect (slope) of the SNP would be beta_s+2*beta_i (which would have a different sd thus a different significance level). Don't overinterpret each coefficient taken separately.