Question

Differential expression - design matrix with multiple factors and paired samples

1

Entering edit mode

7.1 years ago

ivanarg2 ▴ 80

I'm analyzing gene expression data from an experiment where I have (i) 6 different mice (A:D); (ii) 2 different time points ("pre" and "post" treatment); and (iii) 2 different response types ("responder" vs. "non-responder"). I'm trying to find genes with differential expression after treatment in the responders vs. non-responders. In other words: if I look at post therapy gene expression changes compared to 'pre', which genes are differentially expressed in responders compared to non-responders? Additionally, how can I include the 'paired' aspect into my analysis, where I might get more meaningful information by running the above analysis in a paired, rather than general, manner?

I've spent an entire day reading tutorials on designing design matrices for differential expression but I can't pin down the details, I'm hoping someone with experience might be able to help out.

A small reproducible example:

ms = factor(rep(c("A", "B", "C", "D", "E", "F"), each = 2)) # Mouse ID
rx = factor(rep(c("Pre", "Post"), 6)) # Pre or Post treatment
rp = factor(c(rep("NR", 6), rep("R", 6))) # Is the mouse a 'responder' or 'non-responder'?
data.frame(ms, rx, rp)
   ms   rx rp
1   A  Pre NR
2   A Post NR
3   B  Pre NR
4   B Post NR
5   C  Pre NR
6   C Post NR
7   D  Pre  R
8   D Post  R
9   E  Pre  R
10  E Post  R
11  F  Pre  R
12  F Post  R

The ideal analysis that I'd like to do is something like the following, but including the paired design, which would hopefully make the analysis more powerful:

(Post / Pre in responders) / (Post / Pre in non-responders)

The first term ("Post / Pre in responders") would find DE genes in responders, and by dividing by the second term ("Post / Pre in non-responders") I would ideally get genes which are diferentially expressed only in responders.

I tried running the "Post / Pre" analysis in responders and in non-responders separately, and manually dividing, as in the code above, but it doesn't feel right, and there must be a true way of running that with a properly designed matrix.

RNA-Seq limma • 2.3k views

ADD COMMENT • link updated 7.1 years ago by Devon Ryan 105k • written 7.1 years ago by ivanarg2 ▴ 80

score 4 · Accepted Answer · 2018-06-13

You'll want a design of ~ms + rx*rp and the coefficient of primary interest is then the rx:rp interaction. That then accounts for the pairing and gives you the interaction you want. Note that you'll probably get an error about the matrix being of insufficient rank, in which case you should use ms = factor(c(rep(rep(c("A","B","C"), each=2), 2)). This will keep the pairing, but change things such that each of the groups contains a mouse with the same ID (otherwise correcting for the pairing will confound fitting the response group).