11 months ago by
Republic of Ireland
Yes, from what I understand, DESeq2 does not fit group-specific dispersion estimates, i.e., the dispersion is calculated for each gene across all samples irrespective of what you specify in your design model. In very large datasets, it may be more intuitive to calculate dispersion across your groups of interest and apply weightings, whilst, for smaller datasets, trying to do this could really mess up your normalisation and, it follows, your statistical interpretations from the data.
The dispersion is calculated as:
variance / mean^2
...which is the same as
CoV^2 (square coefficient of variation). See here: https://support.bioconductor.org/p/88880/
I have my own summary of how DESeq2 models dispersion:
Calculate the maximum-likelihood estimate (MLE) of dispersion for
each gene in the dataset (black dots).
Model the MLEs (red curve)
From the model curve fit in 2, predict a value for each gene
Fit an empirical Bayes regression model to the MLEs and use the
predicted values from the model curve fit in Step I, Part 3 (above) as
the mean priors for each gene in the model. In empirical Bayesian
statistics, by supplying 'priors' to the model, one is saying that
these priors are the measured / empirical values and that we want to
'shrink' our current data to match the distribution of these priors.
Predict values from this model (blue dots) - these are the final
What happens is that genes with lower counts have higher dispersion and are 'shrunk' more toward the red line than higher counts, which have lower dispersion.*”
Apparently that's my take. Also see that of the developer on this subject: