Question: Estimating Dispersions In Deseq: Which Is Best, All Conditions At Once, Or Each Condition Comparison Individually?
gravatar for gaelgarcia05
7.4 years ago by
gaelgarcia05210 wrote:

Hi everyone,

I have an RNA-seq dataset which comprises 4 different conditions, and 2 biological replicates per condition, like so:

              cond1_1        cond1_2      cond2_1     cond2_2      cond3_1      cond3_2      cond4_1      cond4_2   

Currently, I have been performing the sizeFactors function, as well as the estimateDispersions function on each table of 2 conditions (4 samples) at a time (the comparison in turn). I make the data frame pertaining ONLY to comparison.X , then do estimateSizeFactors and estimateDispersions, then run the negative binomial Test on those results.

I am wondering, however, if it is best to supply DESeq with all the samples to estimateSizeFactors and estimateDispersions, and then run the paired-condition comparisons. Might this provide more information per gene, or would it be counterproductive?

Thanks, Carmen

R bioconductor deseq rna-seq edger • 4.4k views
ADD COMMENTlink modified 4.0 years ago by Biostar ♦♦ 20 • written 7.4 years ago by gaelgarcia05210
gravatar for Johanna Schott
7.4 years ago by
Johanna Schott390 wrote:

One way to compare what happens in either case is to use plotDispEsts().

If you pool conditions for variance estimation, you assume that the expression of only a few genes changes, but the vast majority does not, and therefore the different conditions are similar to biological replicates. In one of my datasets, however, I had conditions where big groups of genes were regulated, because I was looking at a very drastic response of the cells. In such a case, treating different conditions like replicates will overestimate the variance. Genes that are actually regulated by your condition will look like they have a high variance. As a consequence, the test for differential expression will be rather strict, and you will have a shorter list of regulated genes. You have to decide which way is better for you, depending on your data and on what you expect from your analysis.

ADD COMMENTlink written 7.4 years ago by Johanna Schott390
gravatar for Steve Lianoglou
7.4 years ago by
Steve Lianoglou5.1k
Steve Lianoglou5.1k wrote:

I believe the usual recommendation is to use all of your data for the dispersion estimation.

Also, I'd recommend checking out DESeq2, there are some nice new enhancements over DESeq.

ADD COMMENTlink written 7.4 years ago by Steve Lianoglou5.1k

Thanks, Steve, I'll be sure t check it out!

ADD REPLYlink written 7.4 years ago by gaelgarcia05210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1695 users visited in the last hour