Dear Biostars community,
I have two questions.
First, let me explain about my data.
T0 (the starting point of medication) and T4 (the time point when DNA methylation data was obtained from blood samples) are separated by a span of two years.
We coded individuals who took lithium for two years as 'cases' and those who did not as 'controls,' assigning each group an Age_group value of 1 or 0, respectively.
[data] result (data.frame)
The goal of this analysis is to examine whether lithium treatment could reduce the Epigenetic Age predicted from DNA methylation data.
The covariates included in the model are Age, Sex, PC1, PC2, PC3, PC4, PC5, B, CD4T, CD8T, Mono, NK, and Neutro (cell type proportions).
Initially, I simply used
lm(Epigenetic_age ~ Age_group + Age + Sex + PC1 + PC2 + PC3 + PC4 + PC5 + B + CD4T + CD8T + Mono + NK + Neutro)
to observe the association.
However, to achieve a more accurate model, I implemented model.matrix directly based on the guidelines from the following source: https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html#treating-factors-that-are-not-of-direct-interest-as-random-effects.
I intended to include individual as a random effect ,therefore, I grouped the samples based on Group) Age_group_Time
and created the model.matrix accordingly
design <- model.matrix( ~ 0 + Group + AGE + SEX + BMI + ALCOHOL_USE + SMOKER + BD_Type + batch + Acute_Stable + Center + PC1 + PC2 + PC3 + PC4 + PC5 + B + NK + CD4T + CD8T + Mono + Neutro , data = result)
And then, I ran the lmfit and set up a contrast matrix for comparison.
fit<- lmFit(object = result$Epigenetic_Age, design, block = Subject, correlation = cor$consensus.correlation)
contrasts_Eigenetic_Age <- makeContrasts(
R_T4vsT0=Group1_T4-Group1_T0,
NR_T4vsT0=Group0_T4-Group0_T0,
RvsNR=(Group1_T4-Group1_T0)-(Group0_T4-Group0_T0),
levels = colnames(design)
)
fit_PCGrimAge <- contrasts.fit(fit, contrasts_Epigenetic_Age)
fit_PCGrimAge$coefficients
Currently, I can only obtain coefficients as output.
1. Could you advise on how I could modify my R code to derive p-values and determine statistical significance?
<h6>#</h6>In another analysis, each individual is split into Case and Control groups without a two-year time gap.
Specifically, Case refers to individuals who took lithium over the past two years, and Control to those who took other medications within the same period.
I aimed to observe the effects of two years of lithium treatment on Epigenetic Age inferred from DNA methylation data.
I implemented the analysis with the following code,
lmm <- lm(Epigenetic_Age ~ AGE_Group + AGE + SEX + PC1 + PC2 + PC3 + PC4 + PC5 + B + NK + CD4T + CD8T + Mono + Neutro, data = result2)
yet I'm uncertain if it was executed correctly.
Furthermore, the results show large estimates for certain covariates (B, CD4T, CD8T, Mono, NK, Neutro).
Could you provide insights on how to interpret these results? Or Could you please give me an advice for more appropriate model for this analysis?
Any advice would be greatly appreciated.
Thank you.
Best
Yujin
You're basically missing the following steps after
fit_PCGrimAge <- contrasts.fit(fit, contrasts_Epigenetic_Age)
, I think:fittab
should have the p-values and log fold changes for the contrast(s) you are interested in (RvsNR
I'm guessing).Dear Dunois,
Thanks for your reply.
I thought I cannot use eBayes because compared to methylation array data which has many model (many CpG sites), I have only one model where dependent variable is epigenetic_age...!
Is it okay to use limma for further steps?
Many thanks,
Yujin
I am unfortunately not sufficiently qualified to comment on this aspect of your problem.
Gordon Smyth , the developer of
limma
and statistician extraordinaire is (if I am correct) still active here. Let's see if they can chime in perhaps. If you don't get a response here, I suggest also trying theBioconductor
forums and perhapsCross Validated
.Thank you for your comment Dunois!
Best,
Yujin
You are welcome.