Question: Some questions regarding DESeq2
gravatar for wangdp123
10 months ago by
wangdp123250 wrote:

Hi there,

I am reading through the manual of DESeq2 package and I have run into two questions about how to use this package properly.

1) In order to perform the variance stabilising transformation, there are two ways of doing this.

i) vsd <- vst(dds, blind=TRUE)
ii) vsd <- vst(dds, blind=FALSE)

I understand that using blind=TRUE (by default) is an unsupervised analysis and is good for the quality assurance of samples and using blind=FALSE is to make use of the design formula to estimate the dispersion, which is good for the downstream analysis.

I am wondering which one is recommended or deemed more reasonable if my aim is to make the PCA plots and heatmaps to show the clustering of all samples for the publication?

2) In the condition that the paired samples are to be analysed in terms of differential expression analysis (e.g., the same sample before treatment and after treatment), I realise that the "subject" term should be included in the design formula in addition to the "condition" term. However, which of the below formulas should be preferred and why?

i) ~ subject + condition
ii) ~ subject + condition + subject:condition

Apparently, this question is about under which condition the interaction term ("subject:condition") should be used?

Many thanks,


rna-seq deseq2 • 397 views
ADD COMMENTlink modified 10 months ago by dsull1.7k • written 10 months ago by wangdp123250
gravatar for dsull
10 months ago by
dsull1.7k wrote:

1) Go with blind=FALSE for PCA plots and clustered heatmaps in publications:

"Therefore, for visualization, clustering, or machine learning applications, I tend to recommend blind=FALSE." - Michael Love on

2) Just use ~ subject + condition to account for sample pairing

This means that subject and condition are completely separate: if condition affects gene expression, it will do so irrespective of subject; if subject affects gene expression, it will do so irrespective of condition. In other words, the subject-to-subject differences in expression are accounted for (it's basically saying: Each subject has a baseline expression for each gene, and that baseline is different from subject to subject, but the actual effect of the condition or treatment isn't expected to cause a bigger expression change for one subject versus another subject).

You use interaction terms when, say, you actually think that the treatment's (condition's) effect on gene expression will be different depending on the subject (e.g. treatment affects subject A's gene expression changes differently than subject B's gene expression changes). I tend to use interaction terms when, say, I have two variables: treatment and sex, and my treatment affects males differently than females (i.e. there is an interaction between treatment and sex).

ADD COMMENTlink modified 10 months ago • written 10 months ago by dsull1.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2178 users visited in the last hour