Question

How to perform edgeR based on serial paired samples? (model.matrix design)

3

Entering edit mode

6.9 years ago

CandiceChuDVM ★ 2.4k

Hi all,

I have 4 paired samples in treatment vs control at 3 different time points.
How can I set up a model.matrix so I can treat them as paired samples and compare the following comparisons in glmLRT of edgeR:

Treatment vs control (paired samples), regardless of time
Treatment vs control (paired samples) at T1
Treatment vs control (paired samples) at T2
Treatment vs control (paired samples) at T3

Tentative code:

subject <- factor(c(rep("1", 3),rep("2", 2),rep("3", 3),rep("4", 3),
                rep("1", 3),rep("2", 3),rep("3", 3),rep("4", 3)))
condition <- factor(c(rep("treatment",11), rep("control",12)), levels=c("control","treatment")) #missing one treatment data
timepoints <- factor(c("t1","t2","t3","t1","t2","t1","t2","t3","t1","t2","t3",
                   "t1","t2","t3","t1","t2","t3","t1","t2","t3",
                   "t1","t2","t3"), levels=c("t1","t2","t3"))
sampleTable <- data.frame(subject = as.factor(subject),
                    condition = as.factor(condition),
                      timepoints = as.factor(timepoints))
design <- model.matrix(~subject+condition+timepoints+condition:timepoints)
fit <- glmFit(y, design)

Can my model.matrix fulfill what I need?

Sample table:

subject condition timepoints
1 treatment t1
1 treatment t2
1 treatment t3
2 treatment t1
2 treatment t2
3 treatment t1
3 treatment t2
3 treatment t3
4 treatment t1
4 treatment t2
4 treatment t3
1 control t1
1 control t2
1 control t3
2 control t1
2 control t2
2 control t3
3 control t1
3 control t2
3 control t3
4 control t1
4 control t2
4 control t3

edgeR r RNA-Seq • 4.6k views

ADD COMMENT • link 6.9 years ago by CandiceChuDVM ★ 2.4k

1

Entering edit mode

Not an actual answer but just a question I got by looking at your design matrix, shouldn't you add a +0 such that the first column doesn't represent the intercept? (design <- model.matrix(~0+subject+condition+timepoints+condition:timepoints)). Also as I was writing this I checked out the manual and this seems very similar to the section 3.3 of the edgeR User Guide, perhaps it is worth a look?

ADD REPLY • link 6.9 years ago by biofalconch ★ 1.1k

3

Entering edit mode

The intercept doesn't matter, some people replace it since it makes their contrasts easier to specify or explain.

ADD REPLY • link 6.9 years ago by russhh 5.7k

score 3 · Answer 1 · 2017-06-22

3

Entering edit mode

6.9 years ago

Devon Ryan 104k

Given all of the contrasts you want to perform, you'd be best served making groups out of your dataset:

subject group
1       treatment_t1
1       treatment_t2
1       treatment_t3
2       treatment_t1
...

Then use a design of ~0 + subject + group, after which you can more easily provide the contrasts you want:

(treatment_t1+treatment_t2+treatment_t3)/3 - (control_t1+control_t2+control_t3)/3
treatment_t1 - control_t1
treatment_t2 - control_t2
treatment_t3 - control_t3

ADD COMMENT • link 6.9 years ago by Devon Ryan 104k

1

Entering edit mode

I agree to what Devon suggested. Subject and group should be better to build the model matrix and then proceed with your contracts. Just to add up. Seems like you have paired data. Won't the tests also specify that while computing the contracts specific DE? I believe here paired testing should come into play.

ADD REPLY • link 6.9 years ago by ivivek_ngs ★ 5.2k

2

Entering edit mode

The pairing already took place in the model fit, so unless someone wants to look at per-subject changes it's not going to be mentioned subsequently.

ADD REPLY • link 6.9 years ago by Devon Ryan 104k

1

Entering edit mode

Ah nice, this is nice. Have not had paired data for a while probably that is the reason I was confused. Thanks for the clarification. Something again I learned today.

ADD REPLY • link 6.9 years ago by ivivek_ngs ★ 5.2k

1

Entering edit mode

Hi Devon,

When I apply design <- model.matrix(~0 + subject + group), the colnames(design) are:

[1] "subject1"         "subject2"         "subject3"         "subject4"        
[5] "groupaffected_t2" "groupaffected_t3" "groupcontrol_t1"  "groupcontrol_t2" 
[9] "groupcontrol_t3"

In this situation, how can I specifically choose coef or contrast in lrt <- glmLRT(fit) to fulfill the contrast you proposed?

Thanks a lot!

ADD REPLY • link 6.9 years ago by CandiceChuDVM ★ 2.4k

1

Entering edit mode

Think about it: What is the fitted value for the each subject's treatment-induced difference at time1:

subject1: (subject1) - (subject1 + groupcontrol_t1) = -groupcontrol_t1 # I'd have put the treatment arm as the second level myself

subject2: (subject2) - (subject2 + groupcontrol_t1) = -groupcontrol_t1

... and so on for the other two subjects. The average difference across the subjects is therefore '-groupcontrol_t1'; this is contrast 2 in Devon's answer.

The other single-time contrasts are worked out similarly.

The first contrast is just the average of contrasts 2:4 (based on your factor levels, this should be -(1/3) * (groupcontrol_t1 +groupcontrol_t2 + groupcontrol_t3)).

ADD REPLY • link 6.9 years ago by russhh 5.7k

0

Entering edit mode

Did you merge your affected_t1 group with something else? Anyway, prepend group to what I wrote and swap affected for treatment, since you apparently changed the names from your post.

Edit Silly me, the subject coefficients combine to form affected_t1.

ADD REPLY • link 6.9 years ago by Devon Ryan 104k

0

Entering edit mode

Hi guys, I have some questions about model.matrix design for paired samples

My data: samples collected longitudinally from 8 individuals treated from day 1 to day 17, DAY 0 is the control (no treatment)

#Model matrix
subject <- factor(c(rep(1:8,3), rep(5:8,6)))
timepoint <- factor(c(rep("DAY0",8), rep("DAY1",8), rep("DAY3",8), rep("DAY5",4), rep("DAY7",4),
                      rep("DAY10",4), rep("DAY12",4), rep("DAY15",4), rep("DAY17",4)))
sampleTable <- data.frame(subject = as.factor(subject),
                          timepoint = as.factor(timepoint))
design <- model.matrix(~0 + subject + timepoint)
contr.matrix <- makeContrasts(
  D17vsD0 = timepointDAY17, #1
  D15vsD0 = timepointDAY15, #2
  D12vsD0 = timepointDAY12, #3
  D10vsD0 = timepointDAY10, #4
  D7vsD0 = timepointDAY7, #5
  D5vsD0 = timepointDAY5, #6
  D3vsD0 = timepointDAY3, #7
  D1vsD0 = timepointDAY1, #8
  levels=design) 

#Count data
x <- counts.cpm.filtered
DGEx <- DGEList(counts=x)
DGEx <- calcNormFactors(DGEx, method = "TMM")

#Stats
v <- voom(DGEx, design, plot=TRUE)
vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)
summary(decideTests(efit))

Q1: I saw many model.matrix for paired samples including or not ~0 on the design. What are the implications of using ~0 (intercept)? How it works?

Q2: As DAY0 is not on colnames of my design (is the intercept), I was not able to specify it on my contrast. The contrast will do comparisons against DAY0?

Q3: I have no technical replicates, It is ok to apply this model?

ADD REPLY • link 3.6 years ago by starick.marick ▴ 10