Question

DESeq2: Is dispersion estimation gene-wise or gene- and condition-wise?

2

Entering edit mode

8.3 years ago

Eachan Johnson ▴ 20

Hello all,

I have barcode count data corresponding to the viability of 25+ pooled bacterial strains under various conditions. The marginal distribution of untreated strain counts appears to be Negative Binomial.

I'm trying to use DESeq2 to analyze these data, using a matrix of strains ("genes") as rows and conditions as columns. Since the variation of counts between most conditions for most strains is very large, but between replicates is relatively small, it seems sensible to estimate dispersions (in this case) on a gene- and condition-wise basis.

The language in the DESeq2 vignettes and pre-print seems to suggest the dispersion estimates are "gene-wise". So if you run DESeq() followed by plotDispEsts(), each point corresponds to the variance estimate of a gene across conditions (in my case, strain), or the variance estimate between replicates of a gene under one condition?

I think the conceptual difference I'm talking about is the same as that between blind=TRUE and blind=FALSE in the rlog() and varianceStabilizingTransformation() functions.

Finally, if DESeq2 does estimate dispersions on a solely gene-wise basis, would it be reasonable for me to estimate the dispersions of my data subsetted by each condition in turn, and then feed those results into my whole DESeqDataSet object using dispersions()?

Many thanks for taking the time to read, and for any suggestions you might have.

Eachan

RNA-Seq dispersion-estimation R DESeq2 • 5.0k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.3 years ago by Eachan Johnson ▴ 20

Ram · Accepted Answer · 2016-01-06

4

Entering edit mode

8.3 years ago

vivekbhr ▴ 690

The estimateDispersion function will estimate gene wise dispersions using all columns. Although a GLM fit (based on within group variances and means) is done before estimating so the differences in conditions are also incorporated. So I think you don't have to (and probably shouldn't) use condition-wise estimation.

By the way, why do you want to use DESeq2 here? I mean with a dataset with 25 rows only..

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by vivekbhr ▴ 690

2

Entering edit mode

Indeed, one should not split things by group before estimating dispersions. There is still a power increase with 25 rows when using DESeq2 (or similar) versus a straight GLM, though it's fairly small.

ADD REPLY • link 8.3 years ago by Devon Ryan 104k

1

Entering edit mode

Thanks, Devon, for your advice. To add to the reason why I'm using DESeq2, I'm also trying to avoid reinventing the wheel.

ADD REPLY • link 8.3 years ago by Eachan Johnson ▴ 20

1

Entering edit mode

That's usually a pretty compelling reason :)

BTW, you might instead consider limma. I'm not entirely sure how well DESeq2 scales to the number of samples you have (limma has historically had better luck there, given how it works).

ADD REPLY • link 8.3 years ago by Devon Ryan 104k

0

Entering edit mode

DESeq2 is slower than in the transpose situation, but not unmanageably so - it takes about a day on our server, which I can live with once I'm sure the output is meaningful for my application. I will be sure to investigate limma as well. Thanks, again!

ADD REPLY • link 8.3 years ago by Eachan Johnson ▴ 20

0

Entering edit mode

Thanks for your helpful input, Vivek. I'll re-read the DESeq2 paper again.

To answer your question, the number of columns is the issue more than the number of rows. The experiment I'm working with has 50,000 conditions in duplicate, and I'm interested in differential outgrowth of the 25 strains.

ADD REPLY • link 8.3 years ago by Eachan Johnson ▴ 20

0

Entering edit mode

@Eachan wow this sounds interesting..

ADD REPLY • link 8.3 years ago by vivekbhr ▴ 690

0

Entering edit mode

50,000 conditions in duplicate

Out of curiosity, what experimental design or protocol allows you to test so many conditions? And yes, sounds interesting!

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by dariober 14k