Question

Repeated measures design matrix controlling for covariates age and sex

0

Entering edit mode

2.6 years ago

ASid ▴ 40

Hi Gordon Smyth ; I need to perform an analysis on repeated measures paired data ; an example is below; I want to control for two covariates sex and age ; and want to find out the differentiall expression on my metabolites expression data which is similar to gene expression data for day 7 versus day 0; however I want to account for sex and age effect and also want to identify the metabolites that are only affected by sex or only affected by age;

Dummy Data: has four columns is is my subject id ; there are 11 unique subjects ; time is pre and post vaccination thus T1=prevaccination; T2=postvaccination day7; sex as factor and age as continuous covariate;

> data
    id time exp sex age
1   M1   T1 1.0   m  46
2   M2   T1 1.0   m  12
3   M3   T1 1.2   f  20
4   M4   T1 1.0   f  13
5   M5   T1 1.5   m  30
6   M6   T1 1.3   f  21
7   M7   T1 0.8   f  23
8   M8   T1 0.7   m  26
9   M9   T1 0.6   f  60
10 M10   T1 1.3   f  65
11 M11   T1 1.5   f  68
12  M1   T2 2.0   m  46
13  M2   T2 2.4   m  12
14  M3   T2 2.0   f  20
15  M4   T2 2.3   f  13
16  M5   T2 2.1   m  30
17  M6   T2 1.7   f  21
18  M7   T2 5.4   f  23
19  M8   T2 6.7   m  26
20  M9   T2 3.1   f  60
21 M10   T2 3.4   f  65
22 M11   T2 3.7   f  68

theoretically I was under impression that for one gene or one metabolite (dummy data below) the model using the design matrix below should work where first

model.matrix(~0+id+time+sex+age, data=data) mymodel<-lm(exp~0+id+time+sex+age, data=data)


> design
   idM1 idM10 idM11 idM2 idM3 idM4 idM5 idM6 idM7 idM8 idM9 timeT2 sexm age
1     1     0     0    0    0    0    0    0    0    0    0      0    1  46
2     0     0     0    1    0    0    0    0    0    0    0      0    1  12
3     0     0     0    0    1    0    0    0    0    0    0      0    0  20
4     0     0     0    0    0    1    0    0    0    0    0      0    0  13
5     0     0     0    0    0    0    1    0    0    0    0      0    1  30
6     0     0     0    0    0    0    0    1    0    0    0      0    0  21
7     0     0     0    0    0    0    0    0    1    0    0      0    0  23
8     0     0     0    0    0    0    0    0    0    1    0      0    1  26
9     0     0     0    0    0    0    0    0    0    0    1      0    0  60
10    0     1     0    0    0    0    0    0    0    0    0      0    0  65
11    0     0     1    0    0    0    0    0    0    0    0      0    0  68
12    1     0     0    0    0    0    0    0    0    0    0      1    1  46
13    0     0     0    1    0    0    0    0    0    0    0      1    1  12
14    0     0     0    0    1    0    0    0    0    0    0      1    0  20
15    0     0     0    0    0    1    0    0    0    0    0      1    0  13
16    0     0     0    0    0    0    1    0    0    0    0      1    1  30
17    0     0     0    0    0    0    0    1    0    0    0      1    0  21
18    0     0     0    0    0    0    0    0    1    0    0      1    0  23
19    0     0     0    0    0    0    0    0    0    1    0      1    1  26
20    0     0     0    0    0    0    0    0    0    0    1      1    0  60
21    0     1     0    0    0    0    0    0    0    0    0      1    0  65
22    0     0     1    0    0    0    0    0    0    0    0      1    0  68

first eleven parameters are created by 0~id accounts for each individual and pairing of samples at T1; t2 should give difference between t2 and t1 ; however I am not sure about other two covariates if I am accounting them correctly; I will really appreciate your help. Best amnah

analysis Paired • 1.6k views

ADD COMMENT • link updated 2.6 years ago by Gordon Smyth ★ 7.0k • written 2.6 years ago by ASid ▴ 40

score 1 · Answer 1 · 2021-09-27

I want to account for sex and age effect

Paired analyses already completely control for all subject-specific characteristics including age and sex. That's the whole reason for doing a paired analysis. You can't "correct" for these variables a second time by adding to the design matrix as extra variables. If you try to do so in a limma analysis, limma will simply remove the age and sex variables as superfluous.

I also want to identify the metabolites that are only affected by sex or only affected by age

Sorry, that doesn't make sense. The purpose of your experiment is to determine the effect of vaccination for each patient. The age or sex of the patient doesn't change when they get vaccinated so there are no changes with age or sex.

You could conceivably find vaccination effects that are specific to patients of one gender or specific to certain age groups. That would be a meaningful analysis to do, although the number of patients you have is minimal for that type of analysis. The analysis would look for an interaction between vaccination and age or between vaccination and sex.

score 0 · Answer 2 · 2021-09-28

0

Entering edit mode

2.6 years ago

ASid ▴ 40

Thank You very much for the explanation Gordon Smyth , that makes sense to me. Also I was educating myself and and there are arguments that fixed subject specific effect in paired data design in LMM can be better modeled by modeling subject specific effects using mixed effect model e.g using lme4 package in r. Two things that I understand about their use instead of fixed effect model is that the data for missing values can be handled and also fixed models eats up degrees of freedom which reduces statistical significance ? What is your opinion on that. I am sorry if it is naive question but I am kind of new in this area.

Many thanks. Amnah

ADD COMMENT • link 2.6 years ago by ASid ▴ 40

0

Entering edit mode

I think I have already answered your original question. I will add a few responses to your new questions:

limma fits random effects models using the duplicateCorrelation function (see the User's Guide section on multi-level models)
random effects models are appropriate for some datasets and some scientific purposes but there is no merit in a random effects model for the dataset in your question above. Mixed models are statistically very subtle, but essentially they only offer more information than a fixed model approach when the factor of interest is unbalanced with respect to the blocking variable.
limma always has more power than lme4 because it is able to borrow information between genes

If you have a question in the future about using limma for a particular dataset then I will try to help you. I don't have any more time for general discussions however.

ADD REPLY • link 2.6 years ago by Gordon Smyth ★ 7.0k