Question: DESeq2 contrasts - creating different objects or not?
0
gravatar for r_mvl
19 months ago by
r_mvl10
r_mvl10 wrote:

Hello,

I have what might be a simple question but I would really like to get this right and despite all my readings it is still not clear to me what the best option is.

I am using DESeq2 to analyse an RNAseq experiment. The design of the experiment is as follow (each sample is in triplicate):

  • cell line A - untrt
  • cell line A - trt
  • cell line B - untrt
  • cell line B - trt

The goal is to peform the following contrats:

  • cell line A - untrt versus cell line B - untrt
  • cell line A - trt versus cell line B - trt
  • cell line A - untrt versus cell line A - trt
  • cell line B - untrt versus cell line B - trt

My question is, in order to do things "correctly", should I create a separate DESeq object for each contrast using only the samples I will be using for each contrast, or should I have just a single object from which I do the contrasts that I need?

For your information, the difference between these cell lines is that "cell line A" is overexpressing gene X, while "cell line B" is overexpressing genes X and Y.

Any help would be very much appreciated...

ADD COMMENTlink modified 19 months ago by Ashastry60 • written 19 months ago by r_mvl10
1

Keep everything in one datastructure. Specify the required contrasts separately. Have you considered that there may be genes that are differentially-(induced-by-treatment) between the two cell lines - your current contrasts don't cover that.

ADD REPLYlink written 19 months ago by russhh5.5k

Thanks. Yes absolutely, but in the first place we are not interested in this.

ADD REPLYlink written 19 months ago by r_mvl10
2
gravatar for swbarnes2
19 months ago by
swbarnes28.9k
United States
swbarnes28.9k wrote:

To make the comparisons you want, your sample data should include a column like cellline.treatment, with values like A.treat, A.untreat.

You make one dds object with cellline.treatment as the design with all the samples, and call results 4 times, each time specifying which cellline.treatment to compare to which.

ADD COMMENTlink written 19 months ago by swbarnes28.9k

Can you explain why? Clearly there is a performance benefit to using the approach you describe, but is there a theoretical reason this is the correct way to do this? I have a similar (though more complex) data set, and while the LFC estimates are very similar under either approach, the p-values show some differences (which really makes a difference once you start filtering by adjusted p-value, of course).

ADD REPLYlink written 14 months ago by James Denvir90

Could you explain the design/strategy you have actually used please James. Have you used a different design matrix (eg, ~ cell_line * treatment rather than ~ amalgamated_cellLine_and_treatment ) or have you used several, separate, models?

ADD REPLYlink written 14 months ago by russhh5.5k

It's an interaction, but I'm doing it (in both cases) using the numerical form of the contrast parameter. So to be concrete, here's an example which mimics the design I'm using:

set.seed(1)
dds <- makeExampleDESeqDataSet(m=24)
colData(dds)$treatment <- factor(rep(rep(c('untreated', 'treated'), each=6),2))
colData(dds)$timepoint <- factor(rep(rep(c('tp1', 'tp2'), each=3),4))

So now there are three variables: condition, treatment, and timepoint; each are two-level factors. I want (for example) ~condition*treatment at timepoint tp1, which I can do two ways:

# create separate DESeqDataSet object:
dds_subset <- dds[,colData(dds)$timepoint=='tp1']
colData(dds_subset)$group <- factor(paste(colData(dds_subset)$condition, 
                                                                      colData(dds_subset)$treatment, 
                                                                      colData(dds_subset)$timepoint, sep='_'))
design(dds_subset) <- ~ group
dds_subset <- DESeq(dds_subset)

resultsNames(dds_subset)
# [1] "Intercept"                             
# [2] "group_A_untreated_tp1_vs_A_treated_tp1"
# [3] "group_B_treated_tp1_vs_A_treated_tp1"  
# [4] "group_B_untreated_tp1_vs_A_treated_tp1"

Now I can get the desired interaction term with:

res1 <- results(dds_subset, contrast=c(0, -1, -1, 1)) # (B untreated - B treated) - (A untreated - A treated)

You can verify that this gives essentially identical results to

design(dds_subset) <- condition * treatment
dds_subset <- DESeq(dds_subset)
res1 <- results(dds_subset)

But using the full data set:

 colData(dds)$group <- factor(paste(colData(dds)$condition, 
                                                                      colData(dds)$treatment, 
                                                                      colData(dds)$timepoint, sep='_'))
design(dds) <- ~ group
dds <- DESeq(dds)

resultsNames(dds)
# [1] "Intercept"                             
# [2] "group_A_treated_tp2_vs_A_treated_tp1"  
# [3] "group_A_untreated_tp1_vs_A_treated_tp1"
# [4] "group_A_untreated_tp2_vs_A_treated_tp1"
# [5] "group_B_treated_tp1_vs_A_treated_tp1"  
# [6] "group_B_treated_tp2_vs_A_treated_tp1"  
# [7] "group_B_untreated_tp1_vs_A_treated_tp1"
# [8] "group_B_untreated_tp2_vs_A_treated_tp1"

res2 <- results(dds, contrast=c(0, 0, -1, 0, -1, 0, 1, 0)) # (B untreated - B treated) - (A untreated - A treated) [tp1]

The log2FoldChange of res1 and res2 are similar (not exactly identical, as would be expected because the normalization introduces some differences), but the p-values for some genes show more pronounced differences (this is much more exaggerated in my real data set, which obviously has more genes and real differences between the groups). This has a more exaggerated effect in the padj, and then when you start imposing thresholds on padj the differences can be huge in the resulting gene sets. (Note here, I fully understand that part of this is due to the fact that we're using essentially artificial thresholds on the tails of distributions, but for some downstream analyses this becomes something of a necessary evil. The question is whether there's an a priori reason to choose one approach over the other...)

Or am I missing something here, and not really doing the analysis I think I'm doing?

ADD REPLYlink modified 14 months ago • written 14 months ago by James Denvir90
2
gravatar for Ashastry
19 months ago by
Ashastry60
Ashastry60 wrote:

"should I create a separate DESeq object for each contrast using only the samples"

Create one object.

and then for comparison,

do this - res, comparison1<-results(dds, contrast = c("A.untreated", "B.treated")) and so on.

You can list the contrast names in the object by doing resultNames(object).

ADD COMMENTlink written 19 months ago by Ashastry60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1849 users visited in the last hour