Logfold change (Limma Package) in methylation association studies
1
0
Entering edit mode
16 days ago
ritz ▴ 10

Hi all,

I need some advice. I am doing an analysis to look for an association between methylation and stress. I use lmfit model. My stress variable is continuous and I am not comparing any stress or control groups.

An example of my code:- design <- model.matrix(~ stress + age + gender + smoking + Plate , pheno)

sv <- sva(getM(eset.nosex), design, vfilter = 5e4) ##surrogate varaibles design <- cbind(design, sv$sv) fit <- lmFit(getM(eset.nosex), design) fit <- eBayes(fit) names(fit) # [1] "coefficients" "rank" "assign" "qr" # [5] "df.residual" "sigma" "cov.coefficients" "stdev.unscaled" # [9] "pivot" "Amean" "method" "design" # [13] "df.prior" "s2.prior" "var.prior" "proportion" # [17] "s2.post" "t" "df.total" "p.value" # [21] "lods" "F" "F.p.value" topTable(fit, 2) # logFC AveExpr t P.Value adj.P.Val B # cg22700843 -0.003161108 4.139965 -6.653562 1.448413e-09 0.0004906854 8.411029 # cg13026838 -0.004696756 2.379085 -6.652539 1.455450e-09 0.0004906854 8.406237 # cg01495737 -0.002898319 3.515948 -6.605255 1.820603e-09 0.0004906854 8.184942 # cg18390621 -0.006030697 5.923849 -6.292316 7.889849e-09 0.0015948422 6.736955 # cg26562879 -0.002665734 3.549174 -6.132382 1.651324e-08 0.0026703692 6.008840 # cg00198066 -0.003245212 3.618281 -5.942723 3.923888e-08 0.0052877921 5.156740 # cg24874350 0.007544302 -2.733933 5.898567 4.791739e-08 0.0055348287 4.960210 # cg20905655 0.004923986 2.635893 5.812363 7.064432e-08 0.0071399681 4.578617 How would one interpret log fold change in terms of methylation (especially when there is no group comparison and only an association study)? Arent estimates(coefficients) a better measurement for methylation in association studies? thankyou! limma logfoldchange • 274 views ADD COMMENT 0 Entering edit mode 16 days ago Papyrus ▴ 940 I think the logFC column actually gives the coefficients in this case, did you check in your fit$coefficients object and compare?

Nonetheless, even without getting to the logFCs, because you're fitting the models in M-values (which is the usually recommended method), even the coefficient values will be difficult to interpret biologically. It may be better to interpret/filter the CpGs using the beta values (or even covariate-adjusted beta values).

0
Entering edit mode

Do you mean the coefficients or the estimates?

Yes, I did check fit$coefficients. It gives coefficients for each variable in the design matrix. head(fit$coefficients)

                    (Intercept)     Stress              Gender2                     age


cg14817997 1.3965045 0.02149242 0.052808421 0.02321043 cg26928153 2.3554372 -0.02522342 -0.036111524 0.04056885 cg16269199 0.0455470 0.02982565 -0.114791997 0.06911121 cg13869341 0.2278520 0.05470675 -0.187123556 0.07042281 cg14008030 0.2342564 0.07319959 0.001113212 0.02870625 cg12045430 -3.7704926 -0.06934915 0.168677371 -0.03177391 Smoking Plate2 cg14817997 -0.16372064 0.6785156 cg26928153 -0.04224105 -0.5071137 cg16269199 -0.05919517 -0.8600021 cg13869341 -0.32086356 0.6786138 cg14008030 -0.23856418 2.3189791 cg12045430 0.23110099 -1.7975213

But what I would like to know is just for the variable of interest:- stress

topTable(fit2, 2)

                           logFC       AveExpr               t       P.Value             adj.P.Val        B


cg13547817 -0.3738731 3.5772747 -5.624865 1.955557e-07 0.1581173 5.915087 cg16692227 -0.3559524 0.9775661 -5.455437 4.043169e-07 0.1634560 5.334972 cg03287633 -0.1867909 3.0880473 -5.290847 8.107974e-07 0.1738228 4.778687 cg17474383 -0.2511642 -0.0911020 -5.226623 1.060811e-06 0.1738228 4.563692 cg21858516 -0.2169699 -0.7935851 -5.223461 1.074899e-06 0.1738228 4.553137 cg16294325 -0.2948937 3.3952271 -5.110209 1.719679e-06 0.2317423 4.177111

Usually, when you run a normal lm model (without lmfit) we get estimates, std. error, t value, and pvalue. Which is pretty easy to interpret.

Yes, true beta values are better for biological interpretation, but after running the model in lmfit (using m values) I can convert to beta values and that should work right?

meth_beta = topTable(fit2, 2 , sort.by="p", number = nrow(getM(eset.nosex)))

thanks again.

0
Entering edit mode

Yes, I meant that, if you fit a continuous covariate (e.g. stress), the logFC column actually gives the same (estimates/coefficients) as those estimates/coefficients contained in fit$coefficients. To check, reorder the 2 objects (the $logFC column from topTable and the fit$coefficients) to have the ordered CpG sites in the same objects and see that they are the same numbers: i.e. the $logFC, if you extract the second coefficient, should be the same as the Stress column in fit)

(I just did a lmFit in my own data for a continuous covariate (age) and checked that the logFC contained the same as the age coefficient in fit$coefficients). Unless we're using different terminology, I was referring to the coefficients/estimates as the same thing (they represent the mean of change in Y for unit of change in X, thus being a measure of effect size; if you had factors, they would be the means or changes in means between groups). Regarding going "back" from M-values to b-values, I don't think you can convert coefficients in M-value scale to coefficients in b-value scale, because the M- and b-value relationship is non-linear. For example, a change from 0 to 2 in M-value can be a 30% change in beta, while from 2 to 4, the beta change is much less (see Fig. 1 here). I would rather directly use the original beta values. Maybe you could also estimate the coefficients using beta values (with lmFit) and use those for ranking or filtering CpGs by how much they change in beta value, if you want a more biological interpretation, while using the p-values from the tests done with the M-values. ADD REPLY 0 Entering edit mode Thank you! Yes, I did check the$logfc measurements and fir\$coefficients and they are the same. That helps a lot.

yes, we cannot change the m-value coefficients to coefficients on the b-value scale. I did read the paper and they too advise using m-values for the analysis and b-values for the biological interpretation. I do use the beta values for plotting. If we need b-value-based coefficients then that would mean running the analysis with b-values and that doesn't make sense. It like a loop.

But thanks a ton, the logfc was a great help.

0
Entering edit mode

Yes, I would also do everything on M-values. The reason I suggested also fitting the analyses on b-values is because:

1) with lmFit you have a very fast way of obtaining stress coefficients in b-value scale, which you could us as (biologically interpretable) measures of change

2) because you use other covariates in the models (e.g. gender, age, smoking...), these stress coefficients will be adjusted for these variables and will reflect better what you are really testing in your models. This is because when you do the models you are not actually looking at changes in "raw" b-values, or M-values, in a sense. In these models the effects (the coefficients) of your variable of interest are "adjusted" for the other covariates. If you only looked at the "raw" b-values it would not be exactly the same that you were actually testing within limma.

0
Entering edit mode

That makes very good sense, and very understandable. I would prefer to use m-values too. I also find it easier to work with limma. I just feel it needs to be more 'methylation-friendly' (as in more guides and more examples). Limma is more expression array-based.

Thank you so much !