Enrichment Score Interpretation
Entering edit mode
8 months ago
Will • 0

How interpret the Enrichment Score coming ,for e.g., from the GSEA function in R?. Exactly is better 0 or 1? If I have for e.g. -0.17, +0.80 what is exactly the biological interpretation for the pathways with these specific ES? Finally, is correct or an error to have some pathways with an ES = 84, 11 and 238?

RNA-Seq R pathways gsea Enrichment • 1.1k views
Entering edit mode
8 months ago
xanderpico ▴ 360

Here is a documentation for GSEA, pointing to Enrichment Scores in particular: https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideTEXT.htm#_Enrichment_Score_(ES)

And here is one of the clearest and most thorough explanations I've found: https://www.pathwaycommons.org/guide/primers/data_analysis/gsea/

Answering your specific questions:

  • Larger magnitudes are "better"
  • The sign on the score simply indicates when end of your ranked gene list is enriched. You provide the rank list of genes, so the biiological interpretation is up to you. If, for example, you provide a gene list ranked by a combination of fold change and p-value (e.g., sign(FC) * log10(pvalue)), then the positive scores are associated with upregulated genes and negative scores are associated with downregulated genes. Caution: some tools reverse this, so manually check a few to see which convention they are using.
  • ES values range from -1 to 1. Normalized ES values will go a bit beyond these bounds. If you are seeing ES values > 1, then I would suspect something is wrong.

Additional comment: You probably want to look at NES (Normalized ES) if that is provided by the tool you are using. The first two points above apply just the same.

Entering edit mode
8 months ago
Elucidata ▴ 240

Gene Set Enrichment Analysis (GSEA) is an analytical method to interpret gene expression data.


  • Consider a list (say L) in which genes are ordered according to some measure of correlation. The aim of GSEA is to decide if a gene set will in general happen towards the lower or top part of the ordered list L. The entire ranked list(L) is used to assess how the genes of each gene set are distributed across the ranked list. To do this, GSEA walks down the ranked list of genes, increasing a running-sum statistic when a gene belongs to the set and decreasing it when the gene does not.
  • The enrichment score (ES) is the maximum deviation from zero encountered during that walk.
  • ES is the maximum sum over the list L.

Interpretation of ES value:

  • Higher the ES score, it more likely for a gene set to shift towards either end of the ranked list L.
  • ES is a standard Kolmogorov Smirnov statistic, where p(a tuning parameter) = 0 means the fit is good and p = 1 means the fit is not good.
  • Normalized Enrichment Score lies [0,1]. The positive and negative values indicated the correlation between gene sets and expression data set.

Login before adding your answer.

Traffic: 2294 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6