Suppose you have tumor and normal samples treated with compound A with 0, 1, 2 mM for 0, 1, 2 hours, and you want to do DGE analysis using edgeR. You want to answer the question "which genes are differentially expressed in response to compound A": after taking into account of the impact of disease condition, time of treatment, and concentration of A (noted as [A]), you want to focus on the relationship of treatment by A and gene expression, and don't care about disease condition, time of treatment, not even [A].
How do you do it in edgeR?
In R, assuming the log(expression) being normally distributed, you could simply do glm(log(expression) ~ disease_condition + [A] + time. Then you only need to check the slope of [A] and its p-value, right? But since log(expression) is not normally distributed, we cannot use glm in R.
While in edgeR, things seem to be much more complicated. First, both time and [A] cannot be considered to be continuous (correct me if I am wrong); Second, I don't know how to do the DGE analysis.
For simplicity, let's forget about time in the problem -- just consider disease condition and [A] (0 & 1 mM), each with a replicate. Your gene count file looks like the following (T for tumor, N for normal; A0 for [A] = 0, A1 for [A] = 1; column 2~9 are gene counts for 8 samples, one column for one sample):
GENE T_A0 T_A0 T_A1 T_A1 N_A0 N_A0 N_A1 N_A1
GENE_A ...
GENE_B ...
...
You will have four levels: T_A0, T_A1, N_A0, N_A1.
Your code snippet looks like the following:
y <- DGEList(counts=d, group=grp);
y <- calcNormFactors(y);
dsgn <- model.matrix(~0+grp); # not allowing intercept
y <- estimateDisp(y, dsgn);
fit <-glmFit(y, dsgn)
The question is how to construct your test (or contrast). Assuming the levels are ordered as above, what would be your contrast? c(-0.5, 0.5, -0.5, 0.5) or something else? Or is there a better approach? Or am I terribly wrong?
Thanks in advance!