Question: How to perform edgeR based on serial paired samples? (model.matrix design)
3
gravatar for CandiceChuDVM
2.2 years ago by
CandiceChuDVM1.9k
United States/College Station/Texas A&M University
CandiceChuDVM1.9k wrote:

Hi all,

I have 4 paired samples in treatment vs control at 3 different time points.
How can I set up a model.matrix so I can treat them as paired samples and compare the following comparisons in glmLRT of edgeR:

  1. Treatment vs control (paired samples), regardless of time
  2. Treatment vs control (paired samples) at T1
  3. Treatment vs control (paired samples) at T2
  4. Treatment vs control (paired samples) at T3

Tentative code:

subject <- factor(c(rep("1", 3),rep("2", 2),rep("3", 3),rep("4", 3),
                rep("1", 3),rep("2", 3),rep("3", 3),rep("4", 3)))
condition <- factor(c(rep("treatment",11), rep("control",12)), levels=c("control","treatment")) #missing one treatment data
timepoints <- factor(c("t1","t2","t3","t1","t2","t1","t2","t3","t1","t2","t3",
                   "t1","t2","t3","t1","t2","t3","t1","t2","t3",
                   "t1","t2","t3"), levels=c("t1","t2","t3"))
sampleTable <- data.frame(subject = as.factor(subject),
                    condition = as.factor(condition),
                      timepoints = as.factor(timepoints))
design <- model.matrix(~subject+condition+timepoints+condition:timepoints)
fit <- glmFit(y, design)

Can my model.matrix fulfill what I need?

Sample table:

subject condition timepoints
1 treatment t1
1 treatment t2
1 treatment t3
2 treatment t1
2 treatment t2
3 treatment t1
3 treatment t2
3 treatment t3
4 treatment t1
4 treatment t2
4 treatment t3
1 control t1
1 control t2
1 control t3
2 control t1
2 control t2
2 control t3
3 control t1
3 control t2
3 control t3
4 control t1
4 control t2
4 control t3

rna-seq edger R • 1.5k views
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by CandiceChuDVM1.9k
1

Not an actual answer but just a question I got by looking at your design matrix, shouldn't you add a +0 such that the first column doesn't represent the intercept? (design <- model.matrix(~0+subject+condition+timepoints+condition:timepoints)). Also as I was writing this I checked out the manual and this seems very similar to the section 3.3 of the edgeR User Guide, perhaps it is worth a look?

ADD REPLYlink written 2.2 years ago by biofalconch380
3

The intercept doesn't matter, some people replace it since it makes their contrasts easier to specify or explain.

ADD REPLYlink written 2.2 years ago by russhh4.6k
2
gravatar for Devon Ryan
2.2 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

Given all of the contrasts you want to perform, you'd be best served making groups out of your dataset:

subject group
1       treatment_t1
1       treatment_t2
1       treatment_t3
2       treatment_t1
...

Then use a design of ~0 + subject + group, after which you can more easily provide the contrasts you want:

  1. (treatment_t1+treatment_t2+treatment_t3)/3 - (control_t1+control_t2+control_t3)/3
  2. treatment_t1 - control_t1
  3. treatment_t2 - control_t2
  4. treatment_t3 - control_t3
ADD COMMENTlink written 2.2 years ago by Devon Ryan91k
1

I agree to what Devon suggested. Subject and group should be better to build the model matrix and then proceed with your contracts. Just to add up. Seems like you have paired data. Won't the tests also specify that while computing the contracts specific DE? I believe here paired testing should come into play.

ADD REPLYlink written 2.2 years ago by ivivek_ngs4.8k
2

The pairing already took place in the model fit, so unless someone wants to look at per-subject changes it's not going to be mentioned subsequently.

ADD REPLYlink written 2.2 years ago by Devon Ryan91k
1

Ah nice, this is nice. Have not had paired data for a while probably that is the reason I was confused. Thanks for the clarification. Something again I learned today.

ADD REPLYlink written 2.2 years ago by ivivek_ngs4.8k
1

Hi Devon,

When I apply design <- model.matrix(~0 + subject + group), the colnames(design) are:

[1] "subject1"         "subject2"         "subject3"         "subject4"        
[5] "groupaffected_t2" "groupaffected_t3" "groupcontrol_t1"  "groupcontrol_t2" 
[9] "groupcontrol_t3"

In this situation, how can I specifically choose coef or contrast in lrt <- glmLRT(fit) to fulfill the contrast you proposed?

Thanks a lot!

ADD REPLYlink written 2.2 years ago by CandiceChuDVM1.9k
1

Think about it: What is the fitted value for the each subject's treatment-induced difference at time1:

subject1: (subject1) - (subject1 + groupcontrol_t1) = -groupcontrol_t1 # I'd have put the treatment arm as the second level myself

subject2: (subject2) - (subject2 + groupcontrol_t1) = -groupcontrol_t1

... and so on for the other two subjects. The average difference across the subjects is therefore '-groupcontrol_t1'; this is contrast 2 in Devon's answer.

The other single-time contrasts are worked out similarly.

The first contrast is just the average of contrasts 2:4 (based on your factor levels, this should be -(1/3) * (groupcontrol_t1 +groupcontrol_t2 + groupcontrol_t3)).

ADD REPLYlink written 2.2 years ago by russhh4.6k

Did you merge your affected_t1 group with something else? Anyway, prepend group to what I wrote and swap affected for treatment, since you apparently changed the names from your post.

Edit Silly me, the subject coefficients combine to form affected_t1.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1316 users visited in the last hour