GSEA pvalue output of 0.0. Can I make it more specific?
2
0
Entering edit mode
3.7 years ago
RNAseqer ▴ 260

I would like to know the exact pvalue Im getting from GSEA. However, GSEA reports a pvalue of 0.0 whenever the value is lower than 1/ # of permutations. The manual recommends upping the number of permutations to get a more specific pvalue, however I would like, if possible, to get the exact pvalue for this number of permutations (1000.) Is there a setting I can change to get the program to output the exact value without rounding it off to 0.0? Thanks!

GSEA P-value • 3.1k views
ADD COMMENT
1
Entering edit mode

You will almost never be able to calculate a exact p-value using permutations. To have a chance you would need to be able to calculate through all possible permutations of the data. But even doing so it is possible that you will never have a simulated value as or more extreme than the observed.

If no simulated observations were as or more extreme than observed after 1000 iterations, your real p-value is likely much smaller than your threshold value anyway, so I wouldn't worry about it.

ADD REPLY
0
Entering edit mode

The number of permutation gives you a upper bound on the "real" p-value. If you make 1000 you can say your real p-value < 0.001, if you make 10000 (and the output is 0 again) you can say it is < 0.0001.

ADD REPLY
1
Entering edit mode
3.2 years ago
Gordon Smyth ★ 7.0k

See

Phipson, B, and Smyth, GK (2010). Permutation p-values should never be zero: calculating exact p-values when permutations are randomly drawn. Statistical Applications in Genetics and Molecular Biology Volume 9, Issue 1, Article 39. http://arxiv.org/abs/1603.05766

ADD COMMENT
1
Entering edit mode
3.2 years ago
alserg ▴ 920

It's rather hard to get a proper P-value estimation from Broad GSEA if the reported value is zero. As people mention in comments if you have n permutations and get a zero P-value you should look at it as the statement that the true P-value is < 1/n. Alternatively, if you need a particular number, you can estimate it as 1/(n+1), as suggested in Phipson&Smyth. However, this would be incorrect for Broad GSEA P-value, as they are defined as Pr(ES >= x)/Pr(ES >= 0) for positive enrichment score and Pr(ES <= x)/Pr(ES <= 0) for negative ones. The normalization for Pr(ES >= 0) make the minimal possible P-value to be around 2/n, not 1/n, so some adjustments have to be made to <= 1/n and 1/(n+1) estimations, which is hard to do because some interim values are not reported.

That said, if you are interested in pre-ranked GSEA, I recommend you to look at fgsea R package that we developed (https://www.bioconductor.org/packages/release/bioc/html/fgsea.html, https://www.biorxiv.org/content/10.1101/060012v3). There we have implemented an algorithm to accurately estimate arbitrarily low GSEA P-values, so there are no zero P-value estimates.

ADD COMMENT

Login before adding your answer.

Traffic: 2687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6