Question: DESeq2 proper design setting
6
gravatar for Macspider
4.0 years ago by
Macspider3.3k
Vienna - BOKU
Macspider3.3k wrote:

Hi all,

I am performing a differential expression analysis with DESeq2, with these data:

  • control, 2 replicates
  • treatment, 2 replicates

So far there is still one obscure part of the manual for me: the design variable that you can set in many commands. I grasped the concept behind it but I am still struggling to understand how to use it properly. At the moment, I am using:

design = ~ condition

How would you set it, and why? Could someone write a couple of lines on how should I use that variable properly?

Any help appreciated!

ADD COMMENTlink modified 4.0 years ago by Carlo Yague5.2k • written 4.0 years ago by Macspider3.3k
4
gravatar for Carlo Yague
4.0 years ago by
Carlo Yague5.2k
Canada
Carlo Yague5.2k wrote:

In your case I would simply use the folowing, because the expression depends on only one factor which is the condition (either control or treatment).

condition = as.factor(c("control","treatment")
design = ~ condition

but if your replicates were not processed together, I would also take the batch effect into account.

batch = as.factor(c("rep1", "rep2")
design = ~ condition + batch

More examples on this tutorial.

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Carlo Yague5.2k

With processed you mean sequenced or quality filtered?

ADD REPLYlink written 4.0 years ago by Macspider3.3k
1

I mean the RNA extraction and/or library preparation.

For instance if the two first replicates were extracted together one day while the two second replicates were extracted the day after, you could expect some kind of technical variation to affect gene expression. This is called batch effect, which is annoying. The good thing is that DESeq2 can take it into account in its model.

If all your replicates were processed in parallel, then there is no batch effect. This is an ideal situation.

If you processed the two replicates of the control condition one day, and the two replicates of the treatment condition another day, then there is a batch effect, but you can not control for it. This is the worst situation.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Carlo Yague5.2k

Thank you. Mine is the first scenario, I will add up the batch term in the design. However: is there a list, or a manual or something (not the official one of DESeq2 which I already read) that explains clearly which terms can go in the design function?

ADD REPLYlink written 4.0 years ago by Macspider3.3k
2

the limma user's guide contains a very good introduction to linear models of designed experiments, maybe have a look at the model.matrix help page as well. model.matrix(~ condition) will define a 4x2 matrix containing an 'intercept' column of all-ones and a column containing two 0s (for the controls) and two 1s (for the treatments). DESeq2 fits a coefficient for each column in the design matrix.

ADD REPLYlink written 4.0 years ago by russhh5.5k
1

From your question, I feel (but I could be wrong) that you think that only specific terms are allowed in the design function. This is not the case. The name of the factor doesn't matter at all. For instance instead of condition = as.factor(c("control","treatment") you could write drug = as.factor(c("YES","NO") or azerty = as.factor(c("hello","world").

The design should simply include all the factors that are expected to affect gene expression in your experiment. In your case, the treatment and the batch, whatever the names you give them.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Carlo Yague5.2k

You were right, now I got a piece more in my puzzle. New question: When using ~, or +, what does change? I mean, except from arithmetical things, is there any praxis that I should know? I'm gonna look through the limma documentation as well.

ADD REPLYlink written 4.0 years ago by Macspider3.3k
1

This is the usual synthax for "formula" (see ?formula in R).

~ means that the folowing terms will be the factors in your design.

+ is used to add factors (note that ~ condition + batch is the same as ~ batch + condition)

You also have the operators * and : that are used to specify interactions between factors (not needed in your specific case).

More info here and here in the context of ANOVA and linear regression, respectively.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Carlo Yague5.2k

That was exactly what I was looking for. Thank you.

ADD REPLYlink written 4.0 years ago by Macspider3.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1252 users visited in the last hour