How to perform edgeR based on serial paired samples? (model.matrix design)
1
3
Entering edit mode
6.8 years ago
CandiceChuDVM ★ 2.4k

Hi all,

I have 4 paired samples in treatment vs control at 3 different time points.
How can I set up a model.matrix so I can treat them as paired samples and compare the following comparisons in glmLRT of edgeR:

  1. Treatment vs control (paired samples), regardless of time
  2. Treatment vs control (paired samples) at T1
  3. Treatment vs control (paired samples) at T2
  4. Treatment vs control (paired samples) at T3

Tentative code:

subject <- factor(c(rep("1", 3),rep("2", 2),rep("3", 3),rep("4", 3),
                rep("1", 3),rep("2", 3),rep("3", 3),rep("4", 3)))
condition <- factor(c(rep("treatment",11), rep("control",12)), levels=c("control","treatment")) #missing one treatment data
timepoints <- factor(c("t1","t2","t3","t1","t2","t1","t2","t3","t1","t2","t3",
                   "t1","t2","t3","t1","t2","t3","t1","t2","t3",
                   "t1","t2","t3"), levels=c("t1","t2","t3"))
sampleTable <- data.frame(subject = as.factor(subject),
                    condition = as.factor(condition),
                      timepoints = as.factor(timepoints))
design <- model.matrix(~subject+condition+timepoints+condition:timepoints)
fit <- glmFit(y, design)

Can my model.matrix fulfill what I need?

Sample table:

subject condition timepoints
1 treatment t1
1 treatment t2
1 treatment t3
2 treatment t1
2 treatment t2
3 treatment t1
3 treatment t2
3 treatment t3
4 treatment t1
4 treatment t2
4 treatment t3
1 control t1
1 control t2
1 control t3
2 control t1
2 control t2
2 control t3
3 control t1
3 control t2
3 control t3
4 control t1
4 control t2
4 control t3

edgeR r RNA-Seq • 4.6k views
ADD COMMENT
1
Entering edit mode

Not an actual answer but just a question I got by looking at your design matrix, shouldn't you add a +0 such that the first column doesn't represent the intercept? (design <- model.matrix(~0+subject+condition+timepoints+condition:timepoints)). Also as I was writing this I checked out the manual and this seems very similar to the section 3.3 of the edgeR User Guide, perhaps it is worth a look?

ADD REPLY
3
Entering edit mode

The intercept doesn't matter, some people replace it since it makes their contrasts easier to specify or explain.

ADD REPLY
3
Entering edit mode
6.8 years ago

Given all of the contrasts you want to perform, you'd be best served making groups out of your dataset:

subject group
1       treatment_t1
1       treatment_t2
1       treatment_t3
2       treatment_t1
...

Then use a design of ~0 + subject + group, after which you can more easily provide the contrasts you want:

  1. (treatment_t1+treatment_t2+treatment_t3)/3 - (control_t1+control_t2+control_t3)/3
  2. treatment_t1 - control_t1
  3. treatment_t2 - control_t2
  4. treatment_t3 - control_t3
ADD COMMENT
1
Entering edit mode

I agree to what Devon suggested. Subject and group should be better to build the model matrix and then proceed with your contracts. Just to add up. Seems like you have paired data. Won't the tests also specify that while computing the contracts specific DE? I believe here paired testing should come into play.

ADD REPLY
2
Entering edit mode

The pairing already took place in the model fit, so unless someone wants to look at per-subject changes it's not going to be mentioned subsequently.

ADD REPLY
1
Entering edit mode

Ah nice, this is nice. Have not had paired data for a while probably that is the reason I was confused. Thanks for the clarification. Something again I learned today.

ADD REPLY
1
Entering edit mode

Hi Devon,

When I apply design <- model.matrix(~0 + subject + group), the colnames(design) are:

[1] "subject1"         "subject2"         "subject3"         "subject4"        
[5] "groupaffected_t2" "groupaffected_t3" "groupcontrol_t1"  "groupcontrol_t2" 
[9] "groupcontrol_t3"

In this situation, how can I specifically choose coef or contrast in lrt <- glmLRT(fit) to fulfill the contrast you proposed?

Thanks a lot!

ADD REPLY
1
Entering edit mode

Think about it: What is the fitted value for the each subject's treatment-induced difference at time1:

subject1: (subject1) - (subject1 + groupcontrol_t1) = -groupcontrol_t1 # I'd have put the treatment arm as the second level myself

subject2: (subject2) - (subject2 + groupcontrol_t1) = -groupcontrol_t1

... and so on for the other two subjects. The average difference across the subjects is therefore '-groupcontrol_t1'; this is contrast 2 in Devon's answer.

The other single-time contrasts are worked out similarly.

The first contrast is just the average of contrasts 2:4 (based on your factor levels, this should be -(1/3) * (groupcontrol_t1 +groupcontrol_t2 + groupcontrol_t3)).

ADD REPLY
0
Entering edit mode

Did you merge your affected_t1 group with something else? Anyway, prepend group to what I wrote and swap affected for treatment, since you apparently changed the names from your post.

Edit Silly me, the subject coefficients combine to form affected_t1.

ADD REPLY
0
Entering edit mode

Hi guys, I have some questions about model.matrix design for paired samples

My data: samples collected longitudinally from 8 individuals treated from day 1 to day 17, DAY 0 is the control (no treatment)

#Model matrix
subject <- factor(c(rep(1:8,3), rep(5:8,6)))
timepoint <- factor(c(rep("DAY0",8), rep("DAY1",8), rep("DAY3",8), rep("DAY5",4), rep("DAY7",4),
                      rep("DAY10",4), rep("DAY12",4), rep("DAY15",4), rep("DAY17",4)))
sampleTable <- data.frame(subject = as.factor(subject),
                          timepoint = as.factor(timepoint))
design <- model.matrix(~0 + subject + timepoint)
contr.matrix <- makeContrasts(
  D17vsD0 = timepointDAY17, #1
  D15vsD0 = timepointDAY15, #2
  D12vsD0 = timepointDAY12, #3
  D10vsD0 = timepointDAY10, #4
  D7vsD0 = timepointDAY7, #5
  D5vsD0 = timepointDAY5, #6
  D3vsD0 = timepointDAY3, #7
  D1vsD0 = timepointDAY1, #8
  levels=design) 

#Count data
x <- counts.cpm.filtered
DGEx <- DGEList(counts=x)
DGEx <- calcNormFactors(DGEx, method = "TMM")

#Stats
v <- voom(DGEx, design, plot=TRUE)
vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)
summary(decideTests(efit))

Q1: I saw many model.matrix for paired samples including or not ~0 on the design. What are the implications of using ~0 (intercept)? How it works?

Q2: As DAY0 is not on colnames of my design (is the intercept), I was not able to specify it on my contrast. The contrast will do comparisons against DAY0?

Q3: I have no technical replicates, It is ok to apply this model?

ADD REPLY

Login before adding your answer.

Traffic: 2179 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6