I've been exposed to what I believe is conflicting information as to relationship between gene read counts and variance or dispersion (perhaps these two terms need to be disentangled?).
For example, in figure 3 of this paper, we can clearly see an increase of SD with the mean: https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-304
...And similarly on pg. 47 of this .pdf, though with respect to variance instead of SD: http://www.nathalievialaneix.eu/doc/pdf/tutorial-rnaseq.pdf
While in this example, we see the opposite trend: https://www.researchgate.net/figure/Mean-variance-relationships-Gene-wise-means-and-variances-of-RNA-seq-data-are_fig1_260022492
... And similarly here, though with respect to dispersion instead of SD: A: Small dispersion values in differential expression analysis
The relationship between mean reads and dispersion/variance seems to be an important consideration in the topic of RNA-seq data, and typically referenced as common knowledge in this realm. But which of these trends captures this common knowledge?
My prior experience with data in general leads me to believe that higher read counts == more certainty == less variance, but, of course, due the biological variation, this might not be true (though perhaps that's a distinction between variance and dispersion?).
If anyone can provide some insight, I would be very grateful.
Thanks for your response, Kevin.
I am still confused as to how that the relationship could be so blatantly reversed in certain cases (e.g. in the first two links). If DESeq's publication captures the truest relationship between # counts and dispersion, I would expect "nuance" between datasets to result in, perhaps, a less clear trend, but the complete reversal is a mystery to me.