I am analyzing several clinical experiments in DESeq2.
Half of the samples are control (ctrl), nine samples are treated with combined drug (combined), and two are treated with single drug (single).
Sampleinfo
X treatment drug
1 10A ctrl none
2 10B therapy combined
3 12A ctrl none
4 12B therapy combined
5 13A ctrl none
6 13B therapy combined
7 16A ctrl none
8 16B therapy combined
9 19A ctrl none
10 19B therapy combined
11 24A ctrl none
12 24B therapy single
13 2A ctrl none
14 2B therapy single
15 34A ctrl none
16 34B therapy combined
17 6A ctrl none
18 6B therapy combined
19 7A ctrl none
20 7B therapy combined
21 9A ctrl none
22 9B therapy combined
all(Sampleinfo$Sample == colnames(count_data)) 1 TRUE
I want to analyze differential expression of: A) difference between untreated (ctrl) and combined B) difference between untreated (ctrl) and single C) difference between combined and single treatment, with ctrl as the untreated ctrl
I am mostly interested in question C, as A and B have already been described.
I have found some examples of complex design, but am unclear how to make my sample info table so that I can perform these complex designs. Any input appreciated.
Maybe it has to do with factor levels??
Note on factor levels
By default, R will choose a reference level for factors based on alphabetical order. Then, if you never tell the DESeq2 functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels. There are two solutions: you can either explicitly tell results which comparison to make using the contrast argument (this will be shown later), or you can explicitly set the factors levels. In order to see the change of reference levels reflected in the results names, you need to either run DESeq or nbinomWaldTest/nbinomLRT after the re-leveling operation. Setting the factor levels can be done in two ways, either using factor:
dds$condition <- factor(dds$condition, levels = c("untreated","treated"))
…or using relevel, just specifying the reference level:
dds$condition <- relevel(dds$condition, ref = "untreated")
If you need to subset the columns of a DESeqDataSet, i.e., when removing certain samples from the analysis, it is possible that all the samples for one or more levels of a variable in the design formula would be removed. In this case, the droplevels function can be used to remove those levels which do not have samples in the current DESeqDataSet:
dds$condition <- droplevels(dds$condition)
Thank you that worked, mostly.
I had an error thrown, and went down another biostars answer to make it work. DESeq2 compare all levels
Thank you again. Follow up question. After running DESeq2 analysis which extracts a results file with log2 fold changes and p-values,
check dispersion:
I can run different lfcShrink analysis to help with visualization of fold-changes that can vary widely.
Question: I am interested in comparing therapy_single to therapy_combined, with ctrl_none as the default state. Am I correct in running analysis in this way, repeating lfcShrink for different combinations?
Or would it be advisable to use interactions up front for this nuanced type of question, as I am interested in heightened gene expression or repression vs. different pathways being affected?
Any help appreciated.