Hello,
I have trouble understanding one of the most important GSEA metrics: the signal strength. According to the documentation (http://software.broadinstitute.org/gsea/doc/GSEAUserGuideTEXT.htm#_HTML_Report):
The enrichment signal strength that combines the two previous statistics: (Tag %) × (1 – Gene %) × (N / (N – Nh), where n equals the number of genes in the list and Nh is the number of genes in the gene set.
This formula is also implemented in the GSEA R package (https://github.com/hxin/gsea/blob/master/R/gsea.R):
signal.strength[i] <- tag.frac[i] * (1 - gene.frac[i]) * (N / (N - size.G[i]))
However, I am unable to reproduce signal strength result based on that formula.
I have a signature performing quite well: tags=93%, list=7% and signal=99%. This signature contains 147 genes of which 136 participate in the core enrichment. Accordingly: tag=136/147=0.93, in agreement with indicated output. The gene rank at the peak position is 1710 out of 23287 total genes. Accordingly: list=1710/23287=0.07, in agreement with indicated output. Based on the above formula: signal=136/147 * (1-(1710/23287)) * (23287/(23287-147))=0.86, different to the indicated 0.99 in GSEA output. Also, how signal can be higher than 100% for some signatures (example: PMID 28522862, supplementary file 4) ?