Difference Between "Pooled" And "Blind" In Deseq Dispersion Estimates?
1
4
Entering edit mode
11.1 years ago

Does anyone know what the difference is between dispersion estimates obtained using method="blind" vs method="pooled" in the estimateDispersions() function in DESeq? (I haven't migrated to DESeq2 yet; using DESeq 1.10.1.

From reading the vignette and reference manual, I get the impression that both of these methods estimate a single dispersion estimate for each gene, disregarding the particular experimental condition for each sample. But there must be other differences, why else have them as separate options? Looking at example code, it seems like method="blind" goes together with sharingMode="fit-only" (for DE analysis without replicates), but I wonder if that is a misinterpretation from my side.

deseq differential-expression • 9.3k views
ADD COMMENT
1
Entering edit mode

Maybe with pooled the dispersion derived via the regression is considered together with the higher evaluated dispersions caused by outliers (sharingMode="maximum" is usable) while with blind only the fit is considered (and only fit-only is a suitable choice for sharingMode)?

ADD REPLY
0
Entering edit mode

Yes, maybe that's it. Thanks for the suggestion

ADD REPLY
0
Entering edit mode

I rapidly checked the code and there are some differences...apart from some checks about the existence of replicates with the pooled method. I will explore the code more deeply as soon as I can, but as a first try to clarify things I would like to visually compare dispersion plots derived with the two methods.

ADD REPLY
0
Entering edit mode

Maybe it's something like the difference between the standard "pooled variance" (http://en.wikipedia.org/wiki/Pooled_variance) vs the "normal" variance. Thanks a lot for checking.

ADD REPLY
0
Entering edit mode

The graphs are indeed different. If and when I manage to understand more about this issue I will report here. Thanks for the link!

ADD REPLY
3
Entering edit mode
11.1 years ago

From documentation:

pooled - Use the samples from all conditions with replicates to estimate a single pooled empirical dispersion value, called "pooled", and assign it to all samples.

per-condition - For each condition with replicates, compute a gene's empirical dispersion value by considering the data from samples for this condition. For samples of unreplicated conditions, the maximum of empirical dispersion values from the other conditions is used. If object has a multivariate design (i.e., if a data frame was passed instead of a factor for the condition argument in newCountDataSet), this method is not available. (Note: This method was called “normal” in previous versions.)

blind - Ignore the sample labels and compute a gene's empirical dispersion value as if all samples were replicates of a single condition. This can be done even if there are no biological replicates. This method can lead to loss of power; see the vignette for details. The single estimated dispersion condition is called "blind" and used for all samples.

Hope this helps.

ADD COMMENT
1
Entering edit mode

I have read this in the documentation - should have stated that in the question, sorry about that - and yet it's not clear to me exactly what the difference is between "using the samples from all conditions with replicates to estimate a single pooled empirical dispersion value" and to "ignore the sample labels and compute a gene's empirical dispersion value as if all samples were replicates of a single condition".

ADD REPLY
0
Entering edit mode

Reading the manual description carefully it emerges that "pooled" refers to samples that have biological replicates. The "blind"method instead is applied to samples with no biological replicates. I guess the amount of samples is here also an important factor for the normalization

ADD REPLY

Login before adding your answer.

Traffic: 1369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6