DESeq2 contrasts - creating different objects or not?
2
2
Entering edit mode
2.8 years ago
r_mvl ▴ 30

Hello,

I have what might be a simple question but I would really like to get this right and despite all my readings it is still not clear to me what the best option is.

I am using DESeq2 to analyse an RNAseq experiment. The design of the experiment is as follow (each sample is in triplicate):

• cell line A - untrt
• cell line A - trt
• cell line B - untrt
• cell line B - trt

The goal is to peform the following contrats:

• cell line A - untrt versus cell line B - untrt
• cell line A - trt versus cell line B - trt
• cell line A - untrt versus cell line A - trt
• cell line B - untrt versus cell line B - trt

My question is, in order to do things "correctly", should I create a separate DESeq object for each contrast using only the samples I will be using for each contrast, or should I have just a single object from which I do the contrasts that I need?

For your information, the difference between these cell lines is that "cell line A" is overexpressing gene X, while "cell line B" is overexpressing genes X and Y.

Any help would be very much appreciated...

RNA-Seq DESeq2 experimental design • 3.6k views
1
Entering edit mode

Keep everything in one datastructure. Specify the required contrasts separately. Have you considered that there may be genes that are differentially-(induced-by-treatment) between the two cell lines - your current contrasts don't cover that.

0
Entering edit mode

Thanks. Yes absolutely, but in the first place we are not interested in this.

2
Entering edit mode
2.8 years ago

To make the comparisons you want, your sample data should include a column like cellline.treatment, with values like A.treat, A.untreat.

You make one dds object with cellline.treatment as the design with all the samples, and call results 4 times, each time specifying which cellline.treatment to compare to which.

0
Entering edit mode

Can you explain why? Clearly there is a performance benefit to using the approach you describe, but is there a theoretical reason this is the correct way to do this? I have a similar (though more complex) data set, and while the LFC estimates are very similar under either approach, the p-values show some differences (which really makes a difference once you start filtering by adjusted p-value, of course).

0
Entering edit mode

Could you explain the design/strategy you have actually used please James. Have you used a different design matrix (eg, ~ cell_line * treatment rather than ~ amalgamated_cellLine_and_treatment ) or have you used several, separate, models?

1
Entering edit mode

It's an interaction, but I'm doing it (in both cases) using the numerical form of the contrast parameter. So to be concrete, here's an example which mimics the design I'm using:

set.seed(1)
dds <- makeExampleDESeqDataSet(m=24)
colData(dds)$treatment <- factor(rep(rep(c('untreated', 'treated'), each=6),2)) colData(dds)$timepoint <- factor(rep(rep(c('tp1', 'tp2'), each=3),4))


So now there are three variables: condition, treatment, and timepoint; each are two-level factors. I want (for example) ~condition*treatment at timepoint tp1, which I can do two ways:

# create separate DESeqDataSet object:
dds_subset <- dds[,colData(dds)$timepoint=='tp1'] colData(dds_subset)$group <- factor(paste(colData(dds_subset)$condition, colData(dds_subset)$treatment,
colData(dds_subset)$timepoint, sep='_')) design(dds_subset) <- ~ group dds_subset <- DESeq(dds_subset) resultsNames(dds_subset) # [1] "Intercept" # [2] "group_A_untreated_tp1_vs_A_treated_tp1" # [3] "group_B_treated_tp1_vs_A_treated_tp1" # [4] "group_B_untreated_tp1_vs_A_treated_tp1"  Now I can get the desired interaction term with: res1 <- results(dds_subset, contrast=c(0, -1, -1, 1)) # (B untreated - B treated) - (A untreated - A treated)  You can verify that this gives essentially identical results to design(dds_subset) <- condition * treatment dds_subset <- DESeq(dds_subset) res1 <- results(dds_subset)  But using the full data set:  colData(dds)$group <- factor(paste(colData(dds)$condition, colData(dds)$treatment,
colData(dds)\$timepoint, sep='_'))
design(dds) <- ~ group
dds <- DESeq(dds)

resultsNames(dds)
# [1] "Intercept"
# [2] "group_A_treated_tp2_vs_A_treated_tp1"
# [3] "group_A_untreated_tp1_vs_A_treated_tp1"
# [4] "group_A_untreated_tp2_vs_A_treated_tp1"
# [5] "group_B_treated_tp1_vs_A_treated_tp1"
# [6] "group_B_treated_tp2_vs_A_treated_tp1"
# [7] "group_B_untreated_tp1_vs_A_treated_tp1"
# [8] "group_B_untreated_tp2_vs_A_treated_tp1"

res2 <- results(dds, contrast=c(0, 0, -1, 0, -1, 0, 1, 0)) # (B untreated - B treated) - (A untreated - A treated) [tp1]


The log2FoldChange of res1 and res2 are similar (not exactly identical, as would be expected because the normalization introduces some differences), but the p-values for some genes show more pronounced differences (this is much more exaggerated in my real data set, which obviously has more genes and real differences between the groups). This has a more exaggerated effect in the padj, and then when you start imposing thresholds on padj the differences can be huge in the resulting gene sets. (Note here, I fully understand that part of this is due to the fact that we're using essentially artificial thresholds on the tails of distributions, but for some downstream analyses this becomes something of a necessary evil. The question is whether there's an a priori reason to choose one approach over the other...)

Or am I missing something here, and not really doing the analysis I think I'm doing?

2
Entering edit mode
2.8 years ago
Ashastry ▴ 60

"should I create a separate DESeq object for each contrast using only the samples"

Create one object.

and then for comparison,

do this - res, comparison1<-results(dds, contrast = c("A.untreated", "B.treated")) and so on.

You can list the contrast names in the object by doing resultNames(object).