I have barcode count data corresponding to the viability of 25+ pooled bacterial strains under various conditions. The marginal distribution of untreated strain counts appears to be Negative Binomial.
I'm trying to use DESeq2 to analyze these data, using a matrix of strains ("genes") as rows and conditions as columns. Since the variation of counts between most conditions for most strains is very large, but between replicates is relatively small, it seems sensible to estimate dispersions (in this case) on a gene- and condition-wise basis.
The language in the DESeq2 vignettes and pre-print seems to suggest the dispersion estimates are "gene-wise". So if you run DESeq() followed by
plotDispEsts(), each point corresponds to the variance estimate of a gene across conditions (in my case, strain), or the variance estimate between replicates of a gene under one condition?
I think the conceptual difference I'm talking about is the same as that between
blind=FALSE in the
Finally, if DESeq2 does estimate dispersions on a solely gene-wise basis, would it be reasonable for me to estimate the dispersions of my data subsetted by each condition in turn, and then feed those results into my whole
DESeqDataSet object using
Many thanks for taking the time to read, and for any suggestions you might have.