/ Example data with three levels: dds <- makeExampleDESeqDataSet() dds$condition <- factor(unlist(lapply(seq(1,3),function(x)

Question

Complex DESeq2 sample file set up

0

Entering edit mode

2.9 years ago

rva_jango ▴ 10

I am analyzing several clinical experiments in DESeq2.

enter image description here

Half of the samples are control (ctrl), nine samples are treated with combined drug (combined), and two are treated with single drug (single).

Sampleinfo
     X treatment     drug
1  10A      ctrl     none
2  10B   therapy combined
3  12A      ctrl     none
4  12B   therapy combined
5  13A      ctrl     none
6  13B   therapy combined
7  16A      ctrl     none
8  16B   therapy combined
9  19A      ctrl     none
10 19B   therapy combined
11 24A      ctrl     none
12 24B   therapy   single
13  2A      ctrl     none
14  2B   therapy   single
15 34A      ctrl     none
16 34B   therapy combined
17  6A      ctrl     none
18  6B   therapy combined
19  7A      ctrl     none
20  7B   therapy combined
21  9A      ctrl     none
22  9B   therapy combined

all(Sampleinfo$Sample == colnames(count_data)) 1 TRUE

I want to analyze differential expression of: A) difference between untreated (ctrl) and combined B) difference between untreated (ctrl) and single C) difference between combined and single treatment, with ctrl as the untreated ctrl

I am mostly interested in question C, as A and B have already been described.

I have found some examples of complex design, but am unclear how to make my sample info table so that I can perform these complex designs. Any input appreciated.

enter image description here

Maybe it has to do with factor levels??

Note on factor levels

By default, R will choose a reference level for factors based on alphabetical order. Then, if you never tell the DESeq2 functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels. There are two solutions: you can either explicitly tell results which comparison to make using the contrast argument (this will be shown later), or you can explicitly set the factors levels. In order to see the change of reference levels reflected in the results names, you need to either run DESeq or nbinomWaldTest/nbinomLRT after the re-leveling operation. Setting the factor levels can be done in two ways, either using factor:

dds$condition <- factor(dds$condition, levels = c("untreated","treated"))

…or using relevel, just specifying the reference level:

dds$condition <- relevel(dds$condition, ref = "untreated")

If you need to subset the columns of a DESeqDataSet, i.e., when removing certain samples from the analysis, it is possible that all the samples for one or more levels of a variable in the design formula would be removed. In this case, the droplevels function can be used to remove those levels which do not have samples in the current DESeqDataSet:

dds$condition <- droplevels(dds$condition)

complex design DESEq2 RNAseq • 1.1k views

ADD COMMENT • link 2.8 years ago by rva_jango ▴ 10

score 3 · Accepted Answer · 2021-06-10

3

Entering edit mode

2.9 years ago

Kevin Blighe 87k

Hi, please create a new column that is a compound of treatment and drug, i.e.,

Sampleinfo$group <- paste0(Sampleinfo$treatment, '_', Sampleinfo$drug)

Then, your design will be ~ group

You can then do the comparisons that you want in the standard way. For example, see: DESeq2 compare all levels

res <- results(dds,
  contrast = c('group', 'therapy_combined', 'ctrl_none'),
  independentFiltering = TRUE,
  alpha = 0.05,
  pAdjustMethod = 'BH',
  parallel = TRUE)
res <- lfcShrink(dds,
  contrast = c('group', 'therapy_combined', 'ctrl_none'),
  res = res)

Kevin

ADD COMMENT • link 2.9 years ago by Kevin Blighe 87k

1

Entering edit mode

Thank you that worked, mostly.

I had an error thrown, and went down another biostars answer to make it work. DESeq2 compare all levels

The crux is that you have to modify the reference level of your design so the comparison of interest becomes available via resultsNames.

Here is an example:

library(DESeq2)

/ Example data with three levels: dds <- makeExampleDESeqDataSet() dds$condition <- factor(unlist(lapply(seq(1,3),function(x)

rep(LETTERS[x], 4))))

/ standard workflow dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) dds <- nbinomWaldTest(dds) resultsNames(dds)

1 "Intercept" "condition_B_vs_A" "condition_C_vs_A"

As you see you have B_vs_A and C_vs_A for which the MLEs that apeglm needs are available. In case you want to shrink the B_vs_C comparison you would need to relevel the condition. You either have to make B or C the reference. Lets take C.

relevel(dds$condition, ref = "C")

dds$condition 1 A A A A B B B B C C C C Levels: C A B

The design is the same (~condition), only the reference level changed, and since it is the same so we do not need to rerun the dispersion estimation but only the Wald test to get new MLE coefficients for the comparison of interest.

dds <- nbinomWaldTest(dds) resultsNames(dds) 1 "Intercept"
"condition_A_vs_C" "condition_B_vs_C"

B_vs_C is now present so we can proceed.

lfcShrink(dds = dds, coef = 3, type = "apeglm")

It is on you to decide whether you want to do that or simply use ashr. Based on the apeglm paper ashr seems to perform decently as well. Don't use type="normal" though, the paper clearly states that it underperforms on comparison to the other two.

ADD REPLY • link 2.9 years ago by rva_jango ▴ 10

0

Entering edit mode

Thank you again. Follow up question. After running DESeq2 analysis which extracts a results file with log2 fold changes and p-values,

enter image description here

dataset <- DESeqDataSetFromMatrix(countData = count_data, 
                              colData = Sampleinfo, 
                              design = ~ group)
dds <- DESeq(dataset)

check dispersion: enter image description here

I can run different lfcShrink analysis to help with visualization of fold-changes that can vary widely.

res <- results(dds,
           contrast = c('group', 'therapy_combined', 'ctrl_none'),
           independentFiltering = TRUE,
           alpha = 0.05,
           pAdjustMethod = 'BH',
           parallel = TRUE)
#view results names
resultsNames(dds)
#view head of results table
head(res)

log2 fold change (MLE): group therapy_combined vs ctrl_none 
Wald test p-value: group therapy combined vs ctrl none

#lfcShrink
res <- lfcShrink(dds,
             contrast = c('group', 'therapy_combined', 'ctrl_none'),
             res = res)# this gives aplegm error code

#can use aplegm or ashr, need to relevel (above) so that comparison of interest becomes available via resultsNames.
#this explicitly labels ctrl_none as the comparator
relevel(dds$group, ref = "ctrl_none")
dds$group

#re-run lfcShrink, specifying type=apeglm and coef in position 3
res3 <- lfcShrink(dds = dds, coef = 3, type = "apeglm")
#re-run position 2
res2 <- lfcShrink(dds = dds, coef = 2, type = "apeglm")

Question: I am interested in comparing therapy_single to therapy_combined, with ctrl_none as the default state. Am I correct in running analysis in this way, repeating lfcShrink for different combinations?

DESeq2 compare all levels

If you have conducted 3 different differential expression comparisons, you can subset the results objects (res1, res2, res3) to include the statistically significant genes. You then subset your original data matrix for these genes.

log2cutoff <- 2 qvaluecutoff <- 0.05

sigGenes <- unique(c( rownames(subset(res1, padj<=qvaluecutoff & abs(log2FoldChange)>=log2cutoff)), rownames(subset(res2, padj<=qvaluecutoff & abs(log2FoldChange)>=log2cutoff)),
rownames(subset(res3, padj<=qvaluecutoff & abs(log2FoldChange)>=log2cutoff)), ))

heat <- assay(rld)[sigGenes,]

Or would it be advisable to use interactions up front for this nuanced type of question, as I am interested in heightened gene expression or repression vs. different pathways being affected?

DESeq2 design suggestions: one combined condition, or two separate conditions?

Yes, you can use interactions and make a kind of complicated contrast which will do what you want, but this way is far more readable.

Interactions are used to answer questions like "Show me the genes where the treatment attenuated or heightened the changes from T1 to T2". You could manually do this by taking the fold change of A at the two timepoints, and subtracting the fold change of B from both time points, and the further your result is from 0, the more the treatment changed the time effect, but you would have no p-values.

Any help appreciated.

ADD REPLY • link 2.8 years ago by rva_jango ▴ 10