Complex DESeq2 sample file set up
1
0
Entering edit mode
13 days ago
rva_jango ▴ 10

I am analyzing several clinical experiments in DESeq2.

Half of the samples are control (ctrl), nine samples are treated with combined drug (combined), and two are treated with single drug (single).

Sampleinfo
X treatment     drug
1  10A      ctrl     none
2  10B   therapy combined
3  12A      ctrl     none
4  12B   therapy combined
5  13A      ctrl     none
6  13B   therapy combined
7  16A      ctrl     none
8  16B   therapy combined
9  19A      ctrl     none
10 19B   therapy combined
11 24A      ctrl     none
12 24B   therapy   single
13  2A      ctrl     none
14  2B   therapy   single
15 34A      ctrl     none
16 34B   therapy combined
17  6A      ctrl     none
18  6B   therapy combined
19  7A      ctrl     none
20  7B   therapy combined
21  9A      ctrl     none
22  9B   therapy combined


all(Sampleinfo$Sample == colnames(count_data)) 1 TRUE I want to analyze differential expression of: A) difference between untreated (ctrl) and combined B) difference between untreated (ctrl) and single C) difference between combined and single treatment, with ctrl as the untreated ctrl I am mostly interested in question C, as A and B have already been described. I have found some examples of complex design, but am unclear how to make my sample info table so that I can perform these complex designs. Any input appreciated. Maybe it has to do with factor levels?? Note on factor levels By default, R will choose a reference level for factors based on alphabetical order. Then, if you never tell the DESeq2 functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels. There are two solutions: you can either explicitly tell results which comparison to make using the contrast argument (this will be shown later), or you can explicitly set the factors levels. In order to see the change of reference levels reflected in the results names, you need to either run DESeq or nbinomWaldTest/nbinomLRT after the re-leveling operation. Setting the factor levels can be done in two ways, either using factor: dds$condition <- factor(dds$condition, levels = c("untreated","treated")) …or using relevel, just specifying the reference level: dds$condition <- relevel(dds$condition, ref = "untreated") If you need to subset the columns of a DESeqDataSet, i.e., when removing certain samples from the analysis, it is possible that all the samples for one or more levels of a variable in the design formula would be removed. In this case, the droplevels function can be used to remove those levels which do not have samples in the current DESeqDataSet: dds$condition <- droplevels(dds$condition) complex design DESEq2 RNAseq • 127 views ADD COMMENT 3 Entering edit mode 13 days ago Hi, please create a new column that is a compound of treatment and drug, i.e., Sampleinfo$group <- paste0(Sampleinfo$treatment, '_', Sampleinfo$drug)


Then, your design will be ~ group

You can then do the comparisons that you want in the standard way. For example, see: DESeq2 compare all levels

res <- results(dds,
contrast = c('group', 'therapy_combined', 'ctrl_none'),
independentFiltering = TRUE,
alpha = 0.05,
parallel = TRUE)
res <- lfcShrink(dds,
contrast = c('group', 'therapy_combined', 'ctrl_none'),
res = res)


Kevin

1
Entering edit mode

Thank you that worked, mostly.

I had an error thrown, and went down another biostars answer to make it work. DESeq2 compare all levels

The crux is that you have to modify the reference level of your design so the comparison of interest becomes available via resultsNames.

Here is an example:

library(DESeq2)

# / Example data with three levels: dds <- makeExampleDESeqDataSet() dds$condition <- factor(unlist(lapply(seq(1,3),function(x) rep(LETTERS[x], 4)))) # / standard workflow dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) dds <- nbinomWaldTest(dds) resultsNames(dds) 1 "Intercept" "condition_B_vs_A" "condition_C_vs_A" As you see you have B_vs_A and C_vs_A for which the MLEs that apeglm needs are available. In case you want to shrink the B_vs_C comparison you would need to relevel the condition. You either have to make B or C the reference. Lets take C. relevel(dds$condition, ref = "C")

dds\$condition 1 A A A A B B B B C C C C Levels: C A B

The design is the same (~condition), only the reference level changed, and since it is the same so we do not need to rerun the dispersion estimation but only the Wald test to get new MLE coefficients for the comparison of interest.

dds <- nbinomWaldTest(dds) resultsNames(dds) 1 "Intercept"
"condition_A_vs_C" "condition_B_vs_C"

B_vs_C is now present so we can proceed.

lfcShrink(dds = dds, coef = 3, type = "apeglm")

It is on you to decide whether you want to do that or simply use ashr. Based on the apeglm paper ashr seems to perform decently as well. Don't use type="normal" though, the paper clearly states that it underperforms on comparison to the other two.