Question: How to perform edgeR based on serial paired samples? (model.matrix design)
3
3.3 years ago by
CandiceChuDVM2.1k
United States/College Station/Texas A&M University
CandiceChuDVM2.1k wrote:

Hi all,

I have 4 paired samples in treatment vs control at 3 different time points.
How can I set up a `model.matrix` so I can treat them as paired samples and compare the following comparisons in `glmLRT` of edgeR:

1. Treatment vs control (paired samples), regardless of time
2. Treatment vs control (paired samples) at T1
3. Treatment vs control (paired samples) at T2
4. Treatment vs control (paired samples) at T3

Tentative code:

``````subject <- factor(c(rep("1", 3),rep("2", 2),rep("3", 3),rep("4", 3),
rep("1", 3),rep("2", 3),rep("3", 3),rep("4", 3)))
condition <- factor(c(rep("treatment",11), rep("control",12)), levels=c("control","treatment")) #missing one treatment data
timepoints <- factor(c("t1","t2","t3","t1","t2","t1","t2","t3","t1","t2","t3",
"t1","t2","t3","t1","t2","t3","t1","t2","t3",
"t1","t2","t3"), levels=c("t1","t2","t3"))
sampleTable <- data.frame(subject = as.factor(subject),
condition = as.factor(condition),
timepoints = as.factor(timepoints))
design <- model.matrix(~subject+condition+timepoints+condition:timepoints)
fit <- glmFit(y, design)
``````

Can my `model.matrix` fulfill what I need?

Sample table:

subject condition timepoints
1 treatment t1
1 treatment t2
1 treatment t3
2 treatment t1
2 treatment t2
3 treatment t1
3 treatment t2
3 treatment t3
4 treatment t1
4 treatment t2
4 treatment t3
1 control t1
1 control t2
1 control t3
2 control t1
2 control t2
2 control t3
3 control t1
3 control t2
3 control t3
4 control t1
4 control t2
4 control t3

rna-seq edger R • 2.4k views
modified 3.3 years ago • written 3.3 years ago by CandiceChuDVM2.1k
1

Not an actual answer but just a question I got by looking at your design matrix, shouldn't you add a +0 such that the first column doesn't represent the intercept? (`design <- model.matrix(~0+subject+condition+timepoints+condition:timepoints)`). Also as I was writing this I checked out the manual and this seems very similar to the section 3.3 of the edgeR User Guide, perhaps it is worth a look?

3

The intercept doesn't matter, some people replace it since it makes their contrasts easier to specify or explain.

3
3.3 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

Given all of the contrasts you want to perform, you'd be best served making groups out of your dataset:

``````subject group
1       treatment_t1
1       treatment_t2
1       treatment_t3
2       treatment_t1
...
``````

Then use a design of `~0 + subject + group`, after which you can more easily provide the contrasts you want:

1. `(treatment_t1+treatment_t2+treatment_t3)/3 - (control_t1+control_t2+control_t3)/3`
2. `treatment_t1 - control_t1`
3. `treatment_t2 - control_t2`
4. `treatment_t3 - control_t3`
1

I agree to what Devon suggested. Subject and group should be better to build the model matrix and then proceed with your contracts. Just to add up. Seems like you have paired data. Won't the tests also specify that while computing the contracts specific DE? I believe here paired testing should come into play.

2

The pairing already took place in the model fit, so unless someone wants to look at per-subject changes it's not going to be mentioned subsequently.

1

Ah nice, this is nice. Have not had paired data for a while probably that is the reason I was confused. Thanks for the clarification. Something again I learned today.

1

Hi Devon,

When I apply `design <- model.matrix(~0 + subject + group)`, the `colnames(design)` are:

``````[1] "subject1"         "subject2"         "subject3"         "subject4"
[5] "groupaffected_t2" "groupaffected_t3" "groupcontrol_t1"  "groupcontrol_t2"
[9] "groupcontrol_t3"
``````

In this situation, how can I specifically choose `coef` or `contrast` in `lrt <- glmLRT(fit)` to fulfill the contrast you proposed?

Thanks a lot!

1

Think about it: What is the fitted value for the each subject's treatment-induced difference at time1:

subject1: (subject1) - (subject1 + groupcontrol_t1) = -groupcontrol_t1 # I'd have put the treatment arm as the second level myself

subject2: (subject2) - (subject2 + groupcontrol_t1) = -groupcontrol_t1

... and so on for the other two subjects. The average difference across the subjects is therefore '-groupcontrol_t1'; this is contrast 2 in Devon's answer.

The other single-time contrasts are worked out similarly.

The first contrast is just the average of contrasts 2:4 (based on your factor levels, this should be -(1/3) * (groupcontrol_t1 +groupcontrol_t2 + groupcontrol_t3)).

Did you merge your `affected_t1` group with something else? Anyway, prepend `group` to what I wrote and swap `affected` for `treatment`, since you apparently changed the names from your post.

Edit Silly me, the subject coefficients combine to form `affected_t1`.

Hi guys, I have some questions about model.matrix design for paired samples

My data: samples collected longitudinally from 8 individuals treated from day 1 to day 17, DAY 0 is the control (no treatment)

``````#Model matrix
subject <- factor(c(rep(1:8,3), rep(5:8,6)))
timepoint <- factor(c(rep("DAY0",8), rep("DAY1",8), rep("DAY3",8), rep("DAY5",4), rep("DAY7",4),
rep("DAY10",4), rep("DAY12",4), rep("DAY15",4), rep("DAY17",4)))
sampleTable <- data.frame(subject = as.factor(subject),
timepoint = as.factor(timepoint))
design <- model.matrix(~0 + subject + timepoint)
contr.matrix <- makeContrasts(
D17vsD0 = timepointDAY17, #1
D15vsD0 = timepointDAY15, #2
D12vsD0 = timepointDAY12, #3
D10vsD0 = timepointDAY10, #4
D7vsD0 = timepointDAY7, #5
D5vsD0 = timepointDAY5, #6
D3vsD0 = timepointDAY3, #7
D1vsD0 = timepointDAY1, #8
levels=design)

#Count data
x <- counts.cpm.filtered
DGEx <- DGEList(counts=x)
DGEx <- calcNormFactors(DGEx, method = "TMM")

#Stats
v <- voom(DGEx, design, plot=TRUE)
vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)
summary(decideTests(efit))
``````

Q1: I saw many `model.matrix` for paired samples including or not `~0` on the `design`. What are the implications of using `~0` (intercept)? How it works?

Q2: As DAY0 is not on colnames of my `design` (is the intercept), I was not able to specify it on my contrast. The contrast will do comparisons against DAY0?

Q3: I have no technical replicates, It is ok to apply this model?