I've been trying to look at differential expression in 33 samples (design file below) with DESeq2 in R and I am unsure about the best way to analyse it.
sample line drug disease time A1 A Yes No 3 A2 A Yes No 3 A3 A Yes No 3 A4 A Placebo No 3 A5 A Placebo No 3 A6 A Placebo No 3 A7 A Yes Yes 1 A8 A Yes Yes 1 A9 A Yes Yes 1 A10 A No Yes 1 A11 A No Yes 1 A12 A No Yes 1 B1 B No Yes 1 B2 B No Yes 1 B3 B No Yes 1 C1 C No Yes 1 C2 C No Yes 1 C3 C No Yes 1 A13 A No Yes 7 A14 A No Yes 7 A15 A No Yes 7 B4 B No Yes 7 B5 B No Yes 7 B6 B No Yes 7 C1 C No Yes 7 C2 C No Yes 7 C3 C No Yes 7 A16 A Yes Yes 7 A17 A Yes Yes 7 A18 A Yes Yes 7 A19 A Placebo Yes 7 A20 A Placebo Yes 7 A21 A Placebo Yes 7
I am mostly interested in pairwise comparisons but I would like the possibility to plot the expression values of selected genes across more than one comparison: for example, I want to look at the differentially expressed genes for line A, time point 3 and with no disease to compare the drug vs the placebo (so that would be: samples A1/A2/A3 vs A4/A5/A6) but also for line A, time point 7 and with no disease (so that would be: samples A16/A17/A18 vs A19/A20/A21). So far the plan has been to create a DESeq object for each one of the comparisons using ~time as a design and I can get lists of differentially expressed genes that way.
Now, what I would like to do is to be able to plot the expression of a selected gene (let's call it gene A) for all the 12 samples (A1/A2/A3/A4/A5/A6/A16/A17/A18/A19/A20/A21) but if I extract the counts from each one of my two DESeq objects it would not be comparable as the counts are normalised given the design matrix and the sample included in each model.
So my first question would be, should I include them all in a single model (with design ~time+drug) then look at pairwise comparisons using the results function from DESeq2 and specifying the selection comparison? And my second question, which is an extension of the first one, if I want to repeat this with different comparisons (comparing the line, or the disease state)/variables (line, drug, disease, time), what would be a general rule for determining which samples should be included in a single model and which ones should be separated?