Deseq2 with one factor and multiple levels
1
8
Entering edit mode
3.6 years ago

Dear all,

I'm dealing with and RNA-seq experiment with a total of 11 different conditions.

I'd like to do pairwise comparisons between some of them (without having a fixed base/ctrl condition). To do that, I'm planning to use the Wald-test.

After setting conditions (n=11), this is the command I used (in this example I want to compare condition C to E)

dds <- DESeqDataSetFromMatrix(countData = cts,
colData = colData,
design = ~ 0 + condition)
dds <- DESeq(dds)
res_condC_versus_condE = results(dds, contrast=c("condition","condC","condE"))


I've two main questions:

1. Do you think the design formula and the results command are set in the correct way?
2. Should I use lfcShrink only for data visualization and ranking??

Best

Marianna

rna-seq deseq2 • 5.6k views
10
Entering edit mode
3.6 years ago

Hey Marianna,

Regarding the design formula, why are you including ~ 0? You should just be doing ~ condition. For further reading on this, I direct you to these posts:

Regarding lfcshrink, I now implement lfcshrink into every analysis as it gives more realistic fold-change estimates. Please see the explanation of the DESeq2 developer, HERE.

Kevin

0
Entering edit mode

Hi Kevin,

If I use ~ condition, then the first level of the factor "condition" is fixed as a control/base condition and all the other levels will be compared to this control condition. How to deal with that?? I thought the only solution was to remove the intercept, so I can set the DE analysis between any condition. Am I wrong?? Are there alternative solutions to easily compare the different levels without fixing a base condition?

As far as lfcshrink, I assume I can use it instead of results(dds).

Thank you Marianna

Thank you

1
Entering edit mode

When you specify what to compare to what with contrast, it doesn't matter what the base level is for your factor.

0
Entering edit mode

Ah!, no, you can leave the intercept in the model (just use ~ condition) and then do any pairwise comparison irerspective of the reference level, as I show, here: A: DESeq2 compare all levels

In that thread, I also show how you can use lfcshrink()

1
Entering edit mode

Thank you Kevin, really appreciated

Marianna

0
Entering edit mode

Hi Kevin,

I did the analysis as you suggested. I noticed that the log2fold changes are REALLY different if using lfcShrink.

For example for the most DE gene I got a log2FC of 4 and 0.05 using lfcShrink and results, respectively. Do you expect such a big difference??

Best Marianna

0
Entering edit mode

I am going to 'randomly' hypothesise that the gene in question has generally low expression in your data? Genes with low expression are quite problematic because they can appear to have inflated fold changes.

For example:

0.9 / 0.05 = 18

100 / 50 = 2


Although the first one has a fold change (linear) of 18, is it meaningful when considering that the expression of both genes in question is 0.9 and 0.05?

0
Entering edit mode

Hi! Sorry to dig up this old thread, but it felt more appropriate to expand the discussion here than posting a new question. Could you please clarify why should the intercept be left in the model? For some reason I just can't wrap my head around linear model intercepts in DE tools (or in general). In the example posted above I kind of see why having an intercept is fine since the contrast argument does not really use the intercept anywhere (I might be way off on this though...), so you just specify your pairwise comparisons with contrast. But following the same logic, why would having a design ~0 + condition be any different then (e.g. they don't use an intercept in this tutorial either https://www.biostars.org/p/461026/)? Or maybe the other way, when would removing an intercept be appropriate in DESeq2?

What prompts me to ask is that I tried all various re-levelling and w/wo intercept designs using contrasts to see what happens (study design like in the above post), the pairwise comparisons come back basically identical (same nr of significant DE and highly similar LFCs) , except for one single dreadful gene that is driving me nuts. (yes, it is a noisy and low expressing one, but now that I found this I can't un-find it. It also happens to gain the highest significant LFC in one of the comparisons, which is just completely off). The only approaches where this seems to be handled appropriately is when I do a no intercept model for the pairwise comparisons or for each pairwise comparison I re-level the factor first for the appropriate comparison, even though I am using contrast like posted here.

2
Entering edit mode

An intercept is useful when you want to think in terms of a factorial design. You then don't need contrasts, since you implicitly have one. If you have something like condition with a bunch of levels and want to do pair-wise comparisons then you dropping the intercept is convenient, since you quite explicitly don't have a factorial design (i.e., something like WT vs. Mutant and Treated vs. Untreated).

You can generally perform the same comparisons with/without an intercept, it's primarily a matter of convenience whether to include one or not.

0
Entering edit mode

Hi, Devon! Thanks a lot for the reply. Okay, so the intercept in DESeq2 is basically only used if you have what you said a factorial design. Used to specify what is the 'baseline' you are comparing against - and this is applied when you use results(dds, name=) instead of contrast, because for a contrast you specifically say which is the baseline you want to test against? Would it not be more straightforward to always explicitly use contrasts without an intercept even for a factorial design or what is the advantage of an intercept in that case? I think I am missing some key concept, I've been doing my best to get through this http://genomicsclass.github.io/book/ for the linear models and I kind of understand it, but also kind of don't at all (at least the intercept).

Thanks again for your reply, it reassures me that at least I am allowed to drop the intercept for the pairwise comparisons.

2
Entering edit mode

Would it not be more straightforward to always explicitly use contrasts without an intercept even for a factorial design or what is the advantage of an intercept in that case?

It's just a matter of how you prefer to think of your experiment. A very common experimental design in biology is a 2x2 factorial design, such as WT/Mut and Treated/Untreated. If I use a model of ~genotype + treatment + genotype:treatment then I directly assess the biological questions I care about: what's the effect of the genotype, the mutation, and their interaction. If I exclude the intercept I need to make groups of genotype_treatment and then use something like ~0 + condition followed by constructing a bunch of contrasts. Those contrasts may be easy for someone like me to construct, but the average wet lab biologist has a near 0 change of doing that correctly. The common retort is that dropping the intercept and using ~0 + condition allows you to do pairwise comparisons between groups. That's nice and all, but those rarely tell me anything biologically worthwhile. So this comes down to a question of what you want to get out of your experiment and how to most easily do so.

Would it not be more straightforward to always explicitly use contrasts

Try doing that with very complex experimental designs. You can do it, but it's not exactly fun.

0
Entering edit mode

Thanks again, Devon, this is very helpful!

I see your point and that the intercept can be useful for more complicated designs than simple condition_A vs condition_B. I think to start with I find contrasts more intuitive (maybe even for complex designs) or I might also just be misleading myself. I wonder where this massive gap in my knowledge comes from, I need to create an 'Intercept for Dummies' guide for myself and soldier on with my simple comparisons for now.

Traffic: 2740 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.