GSEA pvalue output of 0.0. Can I make it more specific?
2
0
Entering edit mode
21 months ago
RNAseqer ▴ 200

I would like to know the exact pvalue Im getting from GSEA. However, GSEA reports a pvalue of 0.0 whenever the value is lower than 1/ # of permutations. The manual recommends upping the number of permutations to get a more specific pvalue, however I would like, if possible, to get the exact pvalue for this number of permutations (1000.) Is there a setting I can change to get the program to output the exact value without rounding it off to 0.0? Thanks!

GSEA P-value • 1.5k views
1
Entering edit mode

You will almost never be able to calculate a exact p-value using permutations. To have a chance you would need to be able to calculate through all possible permutations of the data. But even doing so it is possible that you will never have a simulated value as or more extreme than the observed.

If no simulated observations were as or more extreme than observed after 1000 iterations, your real p-value is likely much smaller than your threshold value anyway, so I wouldn't worry about it.

0
Entering edit mode

The number of permutation gives you a upper bound on the "real" p-value. If you make 1000 you can say your real p-value < 0.001, if you make 10000 (and the output is 0 again) you can say it is < 0.0001.

1
Entering edit mode
15 months ago
Gordon Smyth ★ 4.5k

See

Phipson, B, and Smyth, GK (2010). Permutation p-values should never be zero: calculating exact p-values when permutations are randomly drawn. Statistical Applications in Genetics and Molecular Biology Volume 9, Issue 1, Article 39. http://arxiv.org/abs/1603.05766

1
Entering edit mode
15 months ago
alserg ▴ 740

It's rather hard to get a proper P-value estimation from Broad GSEA if the reported value is zero. As people mention in comments if you have n permutations and get a zero P-value you should look at it as the statement that the true P-value is < 1/n. Alternatively, if you need a particular number, you can estimate it as 1/(n+1), as suggested in Phipson&Smyth. However, this would be incorrect for Broad GSEA P-value, as they are defined as Pr(ES >= x)/Pr(ES >= 0) for positive enrichment score and Pr(ES <= x)/Pr(ES <= 0) for negative ones. The normalization for Pr(ES >= 0) make the minimal possible P-value to be around 2/n, not 1/n, so some adjustments have to be made to <= 1/n and 1/(n+1) estimations, which is hard to do because some interim values are not reported.

That said, if you are interested in pre-ranked GSEA, I recommend you to look at fgsea R package that we developed (https://www.bioconductor.org/packages/release/bioc/html/fgsea.html, https://www.biorxiv.org/content/10.1101/060012v3). There we have implemented an algorithm to accurately estimate arbitrarily low GSEA P-values, so there are no zero P-value estimates.