While trying to better understand the inner-workings of the DESeq2 package I came across this tutorial:
http://www-huber.embl.de/users/klaus/Teaching/DESeq2-Analysis.pdf
The part that confused me was 5.2.2, the inspection and correction of p-values, I had not seen anything like that in the DESeq2 package Vignette. After playing around with it, my results drastically changed and no longer agree with other packages (voom/limma).
I have not seen anything about this in the May 2015 vignette, is it possible that this step is no longer needed?
(I did see the section of the vignette that mentioned what has changed since the 2014 paper, and while it seems this could be part of what it is referring to, my statistical knowledge is to limited to be sure.)
Thank you for any help you can provide.
Just putting it out there that I don't agree with the reasoning in that vignette. A uniform distribution of p-values is only what you should expect when the null hypothesis holds. If there is a large difference between your experimental arms, you may see an enrichment of low pvalues precisely because the null-hypothesis does not hold for a great many genes.
If you see a curious, non-uniform distribution in your p-values, you should really be plotting out the p-value distribution that results from analysing a bunch of label-permuted versions of your original dataset. If this distribution isn't flat then there is something wrong with your model.
Hi,
Were you able to resolve the problem. I also face the same situation where I see a similar pattern before correcting the pvalues. So should we consider this or can we just follow the DESeq2 manual. Kindly guide me
Check out this article, which might help in your interpretation of p value distributions
It has been a while since I looked at this, I am not sure what I ended up doing (probably just followed the manual without empirically estimating the null model variance).
Do you have more than two conditions in which you model together? This was my case for the above plot, and if I was facing the same issue again I might try modeling that comparison separately. I believe DEseq2 calculates the variance per gene across all comparisons. In my case the other comparisons had a lot more variation, so the variance for the above contrast would be too high, giving us the distribution we see.