Hi Ram,
For these kind of questions you're much more likely to get a quick answer by writing to us directly on the GSEA Help forum: https://groups.google.com/g/gsea-help I came across this by pure coincidence.
To answer your questions though, a negtive enrichment score does not mean "not differentially expressed"! It means "enriched in the control/reference/denominator.
GSEA ranks genes between your two defined phenotypes (responders and non-responders in your example). By default GSEA (at least the official desktop application) utilizes the signal-to-noise ratio of one phenotyope vs the other phenptype (see the GSEA user guide for the formula). So, like with log2FC, genes that are upregulated get a positive rank and the genes that are downregulated get a negative rank where the magnitude of the rank is roughly dependent on the degree of differential expression.
After the list of all expressed genes is ranked, GSEA computes enrichment scores by walking down the list and incrementing a running score if a gene is in the set, and decrementing a score if the gene is not in the set. When this hits the midpoint (i.e. genes with zero differential expression) the signs in the running sum flip (you start computing a negative score if genes are in the set). GSEA then calculates set score at the maximum deviation from zero. This score, the enrichment score, will be positive when the preponderance of set members are upregulated, and negative when the preponderance of set members are downregulated.
This is why you're seeing negative scores for the nonresponders, and where the misunderstanding of the methodology is.
GSEA calculates scores that are signed by the direction of the comparison (up is positive, down is negative), but reports results as "Enriched in [A]" or "Enriched in [B]" because the metric of differential expression between the groups is symmetric - if gene 1 is "down" in "phenotype A vs phenotype B", it is "up" if you were to compute "phenotype B vs phenotype A"
In summary:
- Positive enrichment scores reflect sets that are enriched in your "numerator" phenotype (consist predominantly of upregulated genes in responders, in the responders vs. non-responders comparison) and can be traditionally thought of as sets that are upregulated in the comparison you told GSEA to calculate (i.e responders vs. non-responders)
- Negative enrichment scores reflect sets that are enriched in your "denomenator" phenotype (consist predominantly of downregulated genes in responders, in the responders vs. non-responders comparison) and can be traditionally thought of as sets that are downregulated in the comparison you told GSEA to calculate (i.e responders vs. non-responders)
Hopefully this makes sense, feel free to reach out to us if you have any additional questions, or what I said here was unclear in any way!
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
GSEA-MSigDB Team, Mesirov Lab
Department of Medicine
University of California, San Diego
https://GSEA-MSigDB.org/
I think you addressed me (the person that last edited the post) and not the person that asked the question (grsyhhsb).