Question

what does very low estimates but a significant P value mean.

0

Entering edit mode

2.8 years ago

ritz ▴ 10

Hi,

I ran a multiple linear regression to check the relationship between methylation and cortisol.

When I extract the results I have a significant site (p-value) but the coefficients are really small. I am having trouble interpreting the results. Would that mean that this has no practical significance (cause the estimates are very small)?

Or do I need to standardize the independent variable (x = cortisol) that I have used? (if I have to standardize, can you please recommend how to do it via limma)

              coef         AveExpr      t        P.Value      adj.P.Val         
 #cg20460797 -0.005752230 -7.026308   -6.440652 3.950537e-09  0.003194222 
 #cg11409463  0.002670708  2.8029989   5.956008 3.568092e-08  0.02884995 
 #cg01061425 -0.003401554 -6.637181   -5.405267 4.260207e-07  0.125154865

Thank you!

coefficients pvalue regression • 1.5k views

ADD COMMENT • link updated 2.8 years ago by Papyrus ★ 2.9k • written 2.8 years ago by ritz ▴ 10

1

Entering edit mode

It means 1) you performed a lot of tests (looking at adjusted pvals) 2) you have a large sample

ADD REPLY • link 2.8 years ago by German.M.Demidov ★ 2.9k

1

Entering edit mode

Small coefficients (with considerable variability) make no practical sense if taken into account without other methylation markers. But taken all together they may provide a powerful answer to your problem (eg PRS scores for genomic markers)

0.005 is not that small given the beta values are from 0 to 1

ADD REPLY • link 2.8 years ago by German.M.Demidov ★ 2.9k

1

Entering edit mode

Thankyou for your reply. I understand. I was a bit worried that I might have to standardize the variable after I have run the regression. I do not have a very large sample size though but did see some effect. Thank you!

ADD REPLY • link 2.8 years ago by ritz ▴ 10

score 2 · Accepted Answer · 2021-06-10

2

Entering edit mode

2.8 years ago

Papyrus ★ 2.9k

If you have fitted a continuous variable (i.e. if cortisol is numerical, continuous, not a factor), remember that the coefficient represents the change in methylation per change in unit of your variable. It seems cortisol is continuous because you mention "standardizing" it.

Then, it may not be a small coefficient at all and will rather depend on the range of your variable (e.g. if in the data it is normal to see changes in 30 units of cortisol between samples, those would be, for cg20460797, changes in ~15% methylation). (Assuming in this case you fitted b-values instead of M-values, but the reasoning is the same).

ADD COMMENT • link 2.8 years ago by Papyrus ★ 2.9k

0

Entering edit mode

Yes, it is not a factor but a continuous whole number ranging from 50 to 150. Hmm so yes to interpret the coefficients better with respect to methylation it would be better to standardize the variable? I fitted M values actually, so the coefficients are logfc basically.

Thankyou!

ADD REPLY • link 2.8 years ago by ritz ▴ 10

1

Entering edit mode

I guess standardizing could make your coefficients bigger if your variable has a large range, but they would still be in M-values and would be difficult to interpret biologically (better to additionally look at the b-values). AFAIK standardazing won't change the statistical testing too much, but maybe others can chime in. Nonetheless, usually the focus of multiple-testing fitting models with limma is to detect associations with a variable and not to focus too much on each model per se (which will fit each CpG differently). As we have commented before, IMHO once you have the CpGs ranked by significance the most straight-forward thing to do may be use the b-values to also estimate effect sizes/associations if you want to explore them.

ADD REPLY • link 2.8 years ago by Papyrus ★ 2.9k

0

Entering edit mode

Yes, I do understand, that may be standardizing won't change the statistical testing but I think the coefficients would also be easier to interpret.

Okay, say I have cg11409463, cg20460797 that are significant. You would say that I run an lm model with just these (sig cpgs) with beta values and calculate the coefficients on a beta scale for better biological interpretation.

Another issue I had while shifting from m-values to beta, is that I use SVA. and SVA is calculated on the basis of m-value.

Thankyou for your time.

ADD REPLY • link 2.8 years ago by ritz ▴ 10

0

Entering edit mode

Okay, there are multiple issues here then. You have a list of CpGs ranked by significance (adjusted p-val, after using limma). But you also want to look at the effect size of the changes. The problem is that you carry out the tests using M-values, which are difficult to interpret.

The simplest approach would be to "ignore" everything and just compute the correlations between "raw" b-values and cortisol, for the CpGs. This would give you a measure of effect size which you could use to maybe filter or explore the CpGs. Many studies actually do this (looking directly at b-values). However, when using multivariate models, the truth is that what you test is not actually the "raw" values, because they get adjusted for other covariates during the model fitting. Thus, a better approach would be to either 1) adjust the b-values for your covariates, and then look at the correlations, or 2) fit a lm to the b-values with covariates and extract the coefficient. The first option may lead to circumstances in which the b-values go below 0 or over 1 (so the biological interpretation may falter).

But, there's an additional problem: some of your covariates are SVs from the SVA, which you empirically calculate with M-values. No only that, but because M-values are different to b-values, it is also possible that the estimated effect of other covariates (e.g. gender, age) that you used, would change a bit when fitting b-values instead of M-values.

I would guess that the SVs you get with M-values would not be too different from those with b-values (try extracting both and checking, although maybe inputting b-values to SVA violates some assumptions because of their distributions), but I'm not sure. After all, if the SV detects, for example, a batch, this won't change too much regardless of the input data. Maybe this paper may help.

So I'm not sure of the best approximation to use. Nonetheless, it does not seem to be that important, because you have done the testing correctly (in M-values), and you just want to explore the results in the b-value scale. So even using "raw" b-values, if you state it clearly, may not be too misleading. You may also find that the ranking of effect size of CpGs using "raw" b-values as compared to using covariate-adjusted b-values is pretty similar.

ADD REPLY • link 2.8 years ago by Papyrus ★ 2.9k

0

Entering edit mode

hi, Thank you so much for the great advice and the paper. The results from beta values and m values are similar. And therefore decided to go with the m-values.

Thankyou again! Your advice solved the issue.

ADD REPLY • link 2.8 years ago by ritz ▴ 10

0

Entering edit mode

no problem! good luck!

ADD REPLY • link 2.8 years ago by Papyrus ★ 2.9k