Question

Does Normalized Enrichment Score (NES) from GSEA correlate with gene set upregulation and down regulation?

1

Entering edit mode

15 months ago

grsyhhsb ▴ 10

Hello all,

I have a few questions about GSEA's normalized enrichment score.

Let's say we have two conditions, responder to treatment and nonresponder to treatment.

GSEA takes all the genes and ranks them based on how differentially expressed they are between the two groups. Does this ranked list have anything to do with log2 fold change between responders and nonresponders? It seem that the most differentially expressed genes could be highly expressed in either responder or nonresponder, as long as they separate two groups?

The enrichment score represents whether the genes in a geneset are more represented at the top of the list or in the bottom. Does this mean that if a gene set has a positive NES, it has more genes that are differentially expressed between the two groups, and if a gene set has a negative NES, the genes in that gene set are not differentially expressed?? If that is the case. Why do the responders all have positive NES for significant gene sets and nonresponders have all negative NES for their significant gene set? If that is the case, wouldn't negative NES means that the gene set is not interesting for this particular experiment design?

I feel like I misunderstood something. Since the metric is called enrichment score, shouldn't the value give us some information on whether the gene set is expressed higher in one group vs the other? However, based on the explanation, this doesn't seem to be the case.

I appreciate any help you can give me. Thank you very much!

RNA-seq GSEA • 3.0k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 15 months ago by grsyhhsb ▴ 10

Ram · Answer 1 · 2023-01-20

Hi Ram,

For these kind of questions you're much more likely to get a quick answer by writing to us directly on the GSEA Help forum: https://groups.google.com/g/gsea-help I came across this by pure coincidence.

To answer your questions though, a negtive enrichment score does not mean "not differentially expressed"! It means "enriched in the control/reference/denominator.

GSEA ranks genes between your two defined phenotypes (responders and non-responders in your example). By default GSEA (at least the official desktop application) utilizes the signal-to-noise ratio of one phenotyope vs the other phenptype (see the GSEA user guide for the formula). So, like with log2FC, genes that are upregulated get a positive rank and the genes that are downregulated get a negative rank where the magnitude of the rank is roughly dependent on the degree of differential expression.

After the list of all expressed genes is ranked, GSEA computes enrichment scores by walking down the list and incrementing a running score if a gene is in the set, and decrementing a score if the gene is not in the set. When this hits the midpoint (i.e. genes with zero differential expression) the signs in the running sum flip (you start computing a negative score if genes are in the set). GSEA then calculates set score at the maximum deviation from zero. This score, the enrichment score, will be positive when the preponderance of set members are upregulated, and negative when the preponderance of set members are downregulated.

This is why you're seeing negative scores for the nonresponders, and where the misunderstanding of the methodology is. GSEA calculates scores that are signed by the direction of the comparison (up is positive, down is negative), but reports results as "Enriched in [A]" or "Enriched in [B]" because the metric of differential expression between the groups is symmetric - if gene 1 is "down" in "phenotype A vs phenotype B", it is "up" if you were to compute "phenotype B vs phenotype A"

In summary:

Positive enrichment scores reflect sets that are enriched in your "numerator" phenotype (consist predominantly of upregulated genes in responders, in the responders vs. non-responders comparison) and can be traditionally thought of as sets that are upregulated in the comparison you told GSEA to calculate (i.e responders vs. non-responders)
Negative enrichment scores reflect sets that are enriched in your "denomenator" phenotype (consist predominantly of downregulated genes in responders, in the responders vs. non-responders comparison) and can be traditionally thought of as sets that are downregulated in the comparison you told GSEA to calculate (i.e responders vs. non-responders)

Hopefully this makes sense, feel free to reach out to us if you have any additional questions, or what I said here was unclear in any way!

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
GSEA-MSigDB Team, Mesirov Lab
Department of Medicine
University of California, San Diego
https://GSEA-MSigDB.org/