I just want to confirm the behavior the dispersion estimate in DESeq2. I simulated some data with a true dispersion of 0 using the following:
require(DESeq2)
ngenes <- 5000
reps <- 3
freq.cv <- 2
sample.reads <- 1E6
sdlog <- sqrt( log( freq.cv^2+1 ) )
frequencies <- rlnorm(ngenes, sdlog=sdlog)
frequencies <- frequencies/sum(frequencies)
A <- NULL
mu <- frequencies * sample.reads
for (j in 1:(2*reps)) {
A <- cbind(A, rpois(ngenes, mu))
}
colnames(A) <- c(paste('untreated.', 1:reps, sep=''), paste('treated.', 1:reps, sep=''))
colData <- data.frame(c(rep('untreated',reps), rep('treated',reps)))
rownames(colData) <- names(A)
names(colData) <- 'condition'
cds <- DESeqDataSetFromMatrix(A, colData, ~condition)
cds <- estimateSizeFactors(cds)
cds <- DESeq(cds, fitType='parametric')
plotDispEsts(cds, CV=TRUE)
x <- 1:10000
y <- 1/sqrt(x)
lines(x,y,col='green',lwd=2)
And the CV plot is
It looks like the estimated dispersion pretty much tracks the Poisson variance. I would expect the variance of the negative binomial itself to be equal to the Poisson variance when the true dispersion is 0. Does this mean the overall estimated variance of the negative binomial is twice that of the Poisson variance when the true dispersion 0? I recognize this won't add much to the variance at high counts but seems like at low counts it would overestimate the variance by a factor of two. Is this correct?
Many thanks for any clarification here.