Question: How to perform edgeR based on serial paired samples? (model.matrix design)
3
gravatar for CandiceChuDVM
3.3 years ago by
CandiceChuDVM2.1k
United States/College Station/Texas A&M University
CandiceChuDVM2.1k wrote:

Hi all,

I have 4 paired samples in treatment vs control at 3 different time points.
How can I set up a model.matrix so I can treat them as paired samples and compare the following comparisons in glmLRT of edgeR:

  1. Treatment vs control (paired samples), regardless of time
  2. Treatment vs control (paired samples) at T1
  3. Treatment vs control (paired samples) at T2
  4. Treatment vs control (paired samples) at T3

Tentative code:

subject <- factor(c(rep("1", 3),rep("2", 2),rep("3", 3),rep("4", 3),
                rep("1", 3),rep("2", 3),rep("3", 3),rep("4", 3)))
condition <- factor(c(rep("treatment",11), rep("control",12)), levels=c("control","treatment")) #missing one treatment data
timepoints <- factor(c("t1","t2","t3","t1","t2","t1","t2","t3","t1","t2","t3",
                   "t1","t2","t3","t1","t2","t3","t1","t2","t3",
                   "t1","t2","t3"), levels=c("t1","t2","t3"))
sampleTable <- data.frame(subject = as.factor(subject),
                    condition = as.factor(condition),
                      timepoints = as.factor(timepoints))
design <- model.matrix(~subject+condition+timepoints+condition:timepoints)
fit <- glmFit(y, design)

Can my model.matrix fulfill what I need?

Sample table:

subject condition timepoints
1 treatment t1
1 treatment t2
1 treatment t3
2 treatment t1
2 treatment t2
3 treatment t1
3 treatment t2
3 treatment t3
4 treatment t1
4 treatment t2
4 treatment t3
1 control t1
1 control t2
1 control t3
2 control t1
2 control t2
2 control t3
3 control t1
3 control t2
3 control t3
4 control t1
4 control t2
4 control t3

rna-seq edger R • 2.4k views
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by CandiceChuDVM2.1k
1

Not an actual answer but just a question I got by looking at your design matrix, shouldn't you add a +0 such that the first column doesn't represent the intercept? (design <- model.matrix(~0+subject+condition+timepoints+condition:timepoints)). Also as I was writing this I checked out the manual and this seems very similar to the section 3.3 of the edgeR User Guide, perhaps it is worth a look?

ADD REPLYlink written 3.3 years ago by biofalconch470
3

The intercept doesn't matter, some people replace it since it makes their contrasts easier to specify or explain.

ADD REPLYlink written 3.3 years ago by russhh5.5k
3
gravatar for Devon Ryan
3.3 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

Given all of the contrasts you want to perform, you'd be best served making groups out of your dataset:

subject group
1       treatment_t1
1       treatment_t2
1       treatment_t3
2       treatment_t1
...

Then use a design of ~0 + subject + group, after which you can more easily provide the contrasts you want:

  1. (treatment_t1+treatment_t2+treatment_t3)/3 - (control_t1+control_t2+control_t3)/3
  2. treatment_t1 - control_t1
  3. treatment_t2 - control_t2
  4. treatment_t3 - control_t3
ADD COMMENTlink written 3.3 years ago by Devon Ryan97k
1

I agree to what Devon suggested. Subject and group should be better to build the model matrix and then proceed with your contracts. Just to add up. Seems like you have paired data. Won't the tests also specify that while computing the contracts specific DE? I believe here paired testing should come into play.

ADD REPLYlink written 3.3 years ago by ivivek_ngs5.0k
2

The pairing already took place in the model fit, so unless someone wants to look at per-subject changes it's not going to be mentioned subsequently.

ADD REPLYlink written 3.3 years ago by Devon Ryan97k
1

Ah nice, this is nice. Have not had paired data for a while probably that is the reason I was confused. Thanks for the clarification. Something again I learned today.

ADD REPLYlink written 3.3 years ago by ivivek_ngs5.0k
1

Hi Devon,

When I apply design <- model.matrix(~0 + subject + group), the colnames(design) are:

[1] "subject1"         "subject2"         "subject3"         "subject4"        
[5] "groupaffected_t2" "groupaffected_t3" "groupcontrol_t1"  "groupcontrol_t2" 
[9] "groupcontrol_t3"

In this situation, how can I specifically choose coef or contrast in lrt <- glmLRT(fit) to fulfill the contrast you proposed?

Thanks a lot!

ADD REPLYlink written 3.3 years ago by CandiceChuDVM2.1k
1

Think about it: What is the fitted value for the each subject's treatment-induced difference at time1:

subject1: (subject1) - (subject1 + groupcontrol_t1) = -groupcontrol_t1 # I'd have put the treatment arm as the second level myself

subject2: (subject2) - (subject2 + groupcontrol_t1) = -groupcontrol_t1

... and so on for the other two subjects. The average difference across the subjects is therefore '-groupcontrol_t1'; this is contrast 2 in Devon's answer.

The other single-time contrasts are worked out similarly.

The first contrast is just the average of contrasts 2:4 (based on your factor levels, this should be -(1/3) * (groupcontrol_t1 +groupcontrol_t2 + groupcontrol_t3)).

ADD REPLYlink written 3.3 years ago by russhh5.5k

Did you merge your affected_t1 group with something else? Anyway, prepend group to what I wrote and swap affected for treatment, since you apparently changed the names from your post.

Edit Silly me, the subject coefficients combine to form affected_t1.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Devon Ryan97k

Hi guys, I have some questions about model.matrix design for paired samples

My data: samples collected longitudinally from 8 individuals treated from day 1 to day 17, DAY 0 is the control (no treatment)

#Model matrix
subject <- factor(c(rep(1:8,3), rep(5:8,6)))
timepoint <- factor(c(rep("DAY0",8), rep("DAY1",8), rep("DAY3",8), rep("DAY5",4), rep("DAY7",4),
                      rep("DAY10",4), rep("DAY12",4), rep("DAY15",4), rep("DAY17",4)))
sampleTable <- data.frame(subject = as.factor(subject),
                          timepoint = as.factor(timepoint))
design <- model.matrix(~0 + subject + timepoint)
contr.matrix <- makeContrasts(
  D17vsD0 = timepointDAY17, #1
  D15vsD0 = timepointDAY15, #2
  D12vsD0 = timepointDAY12, #3
  D10vsD0 = timepointDAY10, #4
  D7vsD0 = timepointDAY7, #5
  D5vsD0 = timepointDAY5, #6
  D3vsD0 = timepointDAY3, #7
  D1vsD0 = timepointDAY1, #8
  levels=design) 

#Count data
x <- counts.cpm.filtered
DGEx <- DGEList(counts=x)
DGEx <- calcNormFactors(DGEx, method = "TMM")

#Stats
v <- voom(DGEx, design, plot=TRUE)
vfit <- lmFit(v, design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)
summary(decideTests(efit))

Q1: I saw many model.matrix for paired samples including or not ~0 on the design. What are the implications of using ~0 (intercept)? How it works?

Q2: As DAY0 is not on colnames of my design (is the intercept), I was not able to specify it on my contrast. The contrast will do comparisons against DAY0?

Q3: I have no technical replicates, It is ok to apply this model?

ADD REPLYlink modified 18 days ago • written 18 days ago by starick.marick0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1592 users visited in the last hour