Question: Difference Between "Pooled" And "Blind" In Deseq Dispersion Estimates?
4
gravatar for Mikael Huss
6.4 years ago by
Mikael Huss4.6k
Stockholm
Mikael Huss4.6k wrote:

Does anyone know what the difference is between dispersion estimates obtained using method="blind" vs method="pooled" in the estimateDispersions() function in DESeq? (I haven't migrated to DESeq2 yet; using DESeq 1.10.1.

From reading the vignette and reference manual, I get the impression that both of these methods estimate a single dispersion estimate for each gene, disregarding the particular experimental condition for each sample. But there must be other differences, why else have them as separate options? Looking at example code, it seems like method="blind" goes together with sharingMode="fit-only" (for DE analysis without replicates), but I wonder if that is a misinterpretation from my side.

ADD COMMENTlink modified 6.4 years ago by Biomonika (Noolean)3.1k • written 6.4 years ago by Mikael Huss4.6k
1

Maybe with pooled the dispersion derived via the regression is considered together with the higher evaluated dispersions caused by outliers (sharingMode="maximum" is usable) while with blind only the fit is considered (and only fit-only is a suitable choice for sharingMode)?

ADD REPLYlink written 6.4 years ago by vodka80

Yes, maybe that's it. Thanks for the suggestion

ADD REPLYlink written 6.4 years ago by Mikael Huss4.6k

I rapidly checked the code and there are some differences...apart from some checks about the existence of replicates with the pooled method. I will explore the code more deeply as soon as I can, but as a first try to clarify things I would like to visually compare dispersion plots derived with the two methods.

ADD REPLYlink written 6.4 years ago by vodka80

Maybe it's something like the difference between the standard "pooled variance" (http://en.wikipedia.org/wiki/Pooled_variance) vs the "normal" variance. Thanks a lot for checking.

ADD REPLYlink written 6.4 years ago by Mikael Huss4.6k

The graphs are indeed different. If and when I manage to understand more about this issue I will report here. Thanks for the link!

ADD REPLYlink written 6.4 years ago by vodka80
3
gravatar for Biomonika (Noolean)
6.4 years ago by
State College, PA, USA
Biomonika (Noolean)3.1k wrote:

From documentation:

pooled - Use the samples from all conditions with replicates to estimate a single pooled empirical dispersion value, called "pooled", and assign it to all samples.

per-condition - For each condition with replicates, compute a gene's empirical dispersion value by considering the data from samples for this condition. For samples of unreplicated conditions, the maximum of empirical dispersion values from the other conditions is used. If object has a multivariate design (i.e., if a data frame was passed instead of a factor for the condition argument in newCountDataSet), this method is not available. (Note: This method was called “normal” in previous versions.)

blind - Ignore the sample labels and compute a gene's empirical dispersion value as if all samples were replicates of a single condition. This can be done even if there are no biological replicates. This method can lead to loss of power; see the vignette for details. The single estimated dispersion condition is called "blind" and used for all samples.

Hope this helps.

ADD COMMENTlink written 6.4 years ago by Biomonika (Noolean)3.1k
1

I have read this in the documentation - should have stated that in the question, sorry about that - and yet it's not clear to me exactly what the difference is between "using the samples from all conditions with replicates to estimate a single pooled empirical dispersion value" and to "ignore the sample labels and compute a gene's empirical dispersion value as if all samples were replicates of a single condition".

ADD REPLYlink written 6.4 years ago by Mikael Huss4.6k

Reading the manual description carefully it emerges that "pooled" refers to samples that have biological replicates. The "blind"method instead is applied to samples with no biological replicates. I guess the amount of samples is here also an important factor for the normalization

ADD REPLYlink written 3.4 years ago by kristina.gagalova0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 802 users visited in the last hour