Question

Estimating Dispersions In Deseq: Which Is Best, All Conditions At Once, Or Each Condition Comparison Individually?

0

Entering edit mode

11.0 years ago

gaelgarcia05 ▴ 280

Hi everyone,

I have an RNA-seq dataset which comprises 4 different conditions, and 2 biological replicates per condition, like so:

              cond1_1        cond1_2      cond2_1     cond2_2      cond3_1      cond3_2      cond4_1      cond4_2   
gene1       
gene2
gene3
gene4
gene5

Currently, I have been performing the sizeFactors function, as well as the estimateDispersions function on each table of 2 conditions (4 samples) at a time (the comparison in turn). I make the data frame pertaining ONLY to comparison.X , then do estimateSizeFactors and estimateDispersions, then run the negative binomial Test on those results.

I am wondering, however, if it is best to supply DESeq with all the samples to estimateSizeFactors and estimateDispersions, and then run the paired-condition comparisons. Might this provide more information per gene, or would it be counterproductive?

Thanks, Carmen

deseq rna-seq r edger bioconductor • 5.6k views

ADD COMMENT • link updated 7.6 years ago by Biostar 20 • written 11.0 years ago by gaelgarcia05 ▴ 280

score 4 · Answer 1 · 2013-05-13

One way to compare what happens in either case is to use plotDispEsts().

If you pool conditions for variance estimation, you assume that the expression of only a few genes changes, but the vast majority does not, and therefore the different conditions are similar to biological replicates. In one of my datasets, however, I had conditions where big groups of genes were regulated, because I was looking at a very drastic response of the cells. In such a case, treating different conditions like replicates will overestimate the variance. Genes that are actually regulated by your condition will look like they have a high variance. As a consequence, the test for differential expression will be rather strict, and you will have a shorter list of regulated genes. You have to decide which way is better for you, depending on your data and on what you expect from your analysis.

score 2 · Answer 2 · 2013-05-13

2

Entering edit mode

11.0 years ago

Steve Lianoglou 5.2k

I believe the usual recommendation is to use all of your data for the dispersion estimation.

Also, I'd recommend checking out DESeq2, there are some nice new enhancements over DESeq.