Question

Is inspection and correction of pvalues still needed in DESeq2?

3

Entering edit mode

8.8 years ago

jake.hagen ▴ 50

While trying to better understand the inner-workings of the DESeq2 package I came across this tutorial:

http://www-huber.embl.de/users/klaus/Teaching/DESeq2-Analysis.pdf

The part that confused me was 5.2.2, the inspection and correction of p-values, I had not seen anything like that in the DESeq2 package Vignette. After playing around with it, my results drastically changed and no longer agree with other packages (voom/limma).

I have not seen anything about this in the May 2015 vignette, is it possible that this step is no longer needed?

(I did see the section of the vignette that mentioned what has changed since the 2014 paper, and while it seems this could be part of what it is referring to, my statistical knowledge is to limited to be sure.)

Thank you for any help you can provide.

RNA-Seq DESeq2 • 4.2k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.8 years ago by jake.hagen ▴ 50

1

Entering edit mode

Just putting it out there that I don't agree with the reasoning in that vignette. A uniform distribution of p-values is only what you should expect when the null hypothesis holds. If there is a large difference between your experimental arms, you may see an enrichment of low pvalues precisely because the null-hypothesis does not hold for a great many genes.

If you see a curious, non-uniform distribution in your p-values, you should really be plotting out the p-value distribution that results from analysing a bunch of label-permuted versions of your original dataset. If this distribution isn't flat then there is something wrong with your model.

ADD REPLY • link 7.1 years ago by russhh 5.7k

0

Entering edit mode

Hi,

Were you able to resolve the problem. I also face the same situation where I see a similar pattern before correcting the pvalues. So should we consider this or can we just follow the DESeq2 manual. Kindly guide me

ADD REPLY • link 7.1 years ago by EVR ▴ 610

0

Entering edit mode

Check out this article, which might help in your interpretation of p value distributions

ADD REPLY • link 7.1 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

It has been a while since I looked at this, I am not sure what I ended up doing (probably just followed the manual without empirically estimating the null model variance).

Do you have more than two conditions in which you model together? This was my case for the above plot, and if I was facing the same issue again I might try modeling that comparison separately. I believe DEseq2 calculates the variance per gene across all comparisons. In my case the other comparisons had a lot more variation, so the variance for the above contrast would be too high, giving us the distribution we see.

ADD REPLY • link 7.1 years ago by jake.hagen ▴ 50

score 3 · Answer 1 · 2015-07-07

One should always plot the unadjusted p-value distribution when doing multiple testing and this is completely separate from DESeq2 (or any other package). If it has an unexpected distribution then you expect that your null model is wrong (and you probably have no significant results). If that's the case and you continue on to get an empirical null distribution and get the FDR from that then of course your results will differ from what would happen had you not done that.

Ram · Answer 2 · 2015-07-07

3

Entering edit mode

8.8 years ago

Michael Love ★ 2.6k

That section is a description of the fdrtool method by Bernd Klaus (the tutorial is also by Bernd). It is not officially part of the DESeq2 method, but as Devon points out, it is an optional software package downstream of any tool which produces p-values (and so similar to other downstream methods like qvalue).

Note that you should note use fdrtool in combination with lfcThreshold, because we use a conservative approach to threshold testing such that the null p values are no longer expected to be uniform.

See the diagnostic plots section of our RNA-seq workflow for how to plot a histogram of p-values.

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.8 years ago by Michael Love ★ 2.6k

1

Entering edit mode

Thank you for the reply. I understand now that plotting the p-value distribution is a diagnostic step and is not specific to the DESeq2 package. I am struggling with when and why to adjust p-values with empirical null modeling. The way I read the linked tutorial it would seem it should be done whenever the plotted p-values are not uniform or not uniform with a peak near zero.

For example, the below p-value histogram is from a condition that I get very little to no differentially expressed genes. What information would you get from this and what would your next steps be?

If I was following the tutorial, I would think the variance in the null distribution was too high, and use fdrtool to recalculate the p-values using the new estimated variance from the wald test statistics as input. This gives more DEGs but is it correct?

Again thank you for your help.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by jake.hagen ▴ 50

0

Entering edit mode

Dear Michael, I came across a similar problem in a DESeq2 analysis today, and I have two questions to clarify the point that was raised here. A reply would be much appreciated.

(1) You say fdrtools should not be used with lfcThreshold, but the default for lfsThreshold is 0 in SESeq2. Does it mean your comment only applies to cases when lfcThreshold is set to a value != 0, and that the expected raw p-value distributions should be uniform (as usual) in a "default" DESEq2 analysis?

(2) I have a dataset (3 vs 3) with a hill-shaped raw p-value distribution (even when focusing on those with baseMean > 0, as pointed out in the vignette) and 0 DE genes, indicating (as Bernd puts it) an "overestimation of the variance in the null distribution. Thus, the N(0, 1) null distribution of the Wald test is not appropriate here.". When using fdrtools, I get DE genes, which make sense to me (when looking at base mean, log2fc etc). Do you agree this is a sensible and correct approach?

ADD REPLY • link 3.7 years ago by chrarnold84 • 0