Hello,
Great question on interpreting GSEA results from the MSigDB C6 collection—it's a common point of confusion, especially with the directional suffixes. I'll break this down step by step, drawing directly from the MSigDB documentation and general GSEA best practices.
1. Primary Interpretation: NES and FDR as the Core Metrics
Yes, you should primarily interpret your results based on the Normalized Enrichment Score (NES) and False Discovery Rate (FDR) values, as these are the quantitative outputs of GSEA designed to assess the strength and significance of pathway enrichment:
- NES: Measures the magnitude and direction of enrichment. A positive NES (like your +1.76 and +1.85) indicates that the gene set is enriched at the top of your ranked gene list (i.e., genes in the set are coordinately upregulated in your phenotype of interest compared to the control). The absolute value reflects the strength—values >1.5 are generally considered robust.
- FDR: Controls for multiple testing across all gene sets. Your FDR values (0.04 and 0.03) are below the conventional 0.05-0.25 threshold, so these are statistically significant.
These metrics are robust and phenotype-agnostic, making them your starting point for any GSEA result. However, they don't provide biological context on their own—that's where the gene set details (including original publications) come in for deeper validation.
2. Incorporating Original Reference Publications
Absolutely, you should consult the original publications for each gene set to add biological nuance, especially for C6 signatures which are derived from specific perturbation experiments (e.g., oncogene overexpression or knockdown). This helps confirm if the enrichment aligns with expected biology in your RNA-seq context (e.g., cancer vs. non-cancer, treatment effects).
- How to access: In the MSigDB portal (gsea-msigdb.org), search for the gene set name (e.g., "MYC_UP.V1_DN") to view the full description, member genes, and linked PubMed references. For MYC-related sets, key papers often trace back to experiments like those by Zeller et al. (2003) on MYC targets in lymphoblastoid cells.
- When to prioritize: If your NES/FDR hits are borderline or if the pathway doesn't intuitively fit your hypothesis, dive into the refs. For strong signals like yours, it's more for storytelling (e.g., in a manuscript) than decision-making.
- Caveat: C6 sets are "curated signatures" from diverse experiments, so they're not as standardized as Hallmark (H) sets—always cross-check with orthogonal data (e.g., qPCR, Western blots) if possible.
3. Decoding the Suffixes: MYC_UP.V1_DN and VEGF_A_UP.V1_DN
The naming in C6 follows a consistent convention for paired signatures from oncogenic perturbations. Here's the breakdown:
- Base name (e.g., MYC_UP.V1): Refers to the core signature. "MYC_UP" indicates genes upregulated in response to MYC oncogene activation (V1 = version 1, from a specific microarray dataset).
- Suffix (_UP or _DN): Specifies directionality relative to the perturbation:
- _UP: Genes induced/upregulated by the oncogene (e.g., direct targets activated by MYC).
- _DN: Genes repressed/downregulated by the oncogene (e.g., indirect targets suppressed by MYC).
Many C6 sets come in _UP/_DN pairs to capture both arms of regulation.
Interpretation of Your Positive NES Results
For a positive NES on a _DN set like yours:
- It means the genes in the set (which are normally downregulated by the oncogene) are upregulated in your dataset.
- Biologically, this suggests inhibition or antagonism of the oncogenic pathway: Your phenotype may be "reversing" the repression that MYC (or VEGF_A) would impose, potentially indicating pathway suppression rather than activation.
Specifics for your hits:
- MYC_UP.V1_DN (NES +1.76): These are genes repressed by MYC overexpression (e.g., cell cycle inhibitors or apoptosis inducers that MYC normally shuts down). Upregulation here implies reduced MYC activity--your samples might show MYC pathway inhibition, perhaps due to a treatment, mutation, or context (e.g., tumor suppressor activation). Opposite of what you'd see with MYC amplification.
- VEGF_A_UP.V1_DN (NES +1.85): Genes repressed by VEGF-A signaling (e.g., anti-angiogenic factors that VEGF normally suppresses to promote vessel growth). Upregulation suggests attenuated VEGF-A pathway, like reduced angiogenesis signaling--common in anti-VEGF therapies or hypoxic responses.
In contrast, a positive NES for the paired _UP set (e.g., MYC_UP.V1_UP) would indicate pathway activation.
| Gene Set |
Perturbation |
Suffix Meaning |
Positive NES Interpretation |
| MYC_UP.V1_DN |
MYC overexpression |
Genes downregulated by MYC |
MYC repression reversed -> Pathway inhibited |
| VEGF_A_UP.V1_DN |
VEGF-A activation |
Genes downregulated by VEGF-A |
VEGF-A repression reversed -> Angiogenesis suppressed |
References and Tips
- MSigDB C6 Docs: Primary source--Oncogenic Signatures. Search your sets there for exact origins.
- GSEA User Guide: Subramanian et al. (2005, PNAS) for NES/FDR details; Liberzon et al. (2015, PNAS) for MSigDB curation.
- Pro Tip: Visualize with GSEA's Enrichment Plot or tools like fgsea/enrichplot in R to see leading-edge genes. Also, run the paired _UP sets to confirm directionality.
- If your dataset is RNA-seq from cancer, consider integrating with tools like GSVA for score-based validation.
Hope this clarifies things.
Kevin