How should I interpret GSEA results for the C6: Oncogenic Signatures collection (e.g., MYC_UP.V1_DN)?
1
0
Entering edit mode
5 days ago
Laura • 0

Hello everyone,

I’m performing a GSEA analysis using the C6: Oncogenic Signatures collection from MSigDB to explore transcriptional.

I obtained several significantly enriched gene sets, including:

MYC_UP.V1_DN (NES = +1.76, FDR = 0.04)

VEGF_A_UP.V1_DN (NES = +1.85, FDR = 0.03)

I have two main questions:

Should these results be interpreted primarily based on the NES and FDR values, or should I also take into account the original reference publication associated with each gene set?

Specifically, I’m unsure how to interpret the suffixes in gene set names such as MYC_UP.V1_DN. Does “DN” indicate that genes normally downregulated by MYC are upregulated in my dataset, potentially suggesting inhibition of the MYC-related pathway?

Any clarification or references on how to interpret C6 oncogenic signature results would be greatly appreciated.

Thank you in advance!

RNA-seq Oncogenic MSigDB GSEA pathways • 337 views
ADD COMMENT
0
Entering edit mode
2 days ago

Hello,

Great question on interpreting GSEA results from the MSigDB C6 collection—it's a common point of confusion, especially with the directional suffixes. I'll break this down step by step, drawing directly from the MSigDB documentation and general GSEA best practices.

1. Primary Interpretation: NES and FDR as the Core Metrics

Yes, you should primarily interpret your results based on the Normalized Enrichment Score (NES) and False Discovery Rate (FDR) values, as these are the quantitative outputs of GSEA designed to assess the strength and significance of pathway enrichment:

  • NES: Measures the magnitude and direction of enrichment. A positive NES (like your +1.76 and +1.85) indicates that the gene set is enriched at the top of your ranked gene list (i.e., genes in the set are coordinately upregulated in your phenotype of interest compared to the control). The absolute value reflects the strength—values >1.5 are generally considered robust.
  • FDR: Controls for multiple testing across all gene sets. Your FDR values (0.04 and 0.03) are below the conventional 0.05-0.25 threshold, so these are statistically significant.

These metrics are robust and phenotype-agnostic, making them your starting point for any GSEA result. However, they don't provide biological context on their own—that's where the gene set details (including original publications) come in for deeper validation.

2. Incorporating Original Reference Publications

Absolutely, you should consult the original publications for each gene set to add biological nuance, especially for C6 signatures which are derived from specific perturbation experiments (e.g., oncogene overexpression or knockdown). This helps confirm if the enrichment aligns with expected biology in your RNA-seq context (e.g., cancer vs. non-cancer, treatment effects).

  • How to access: In the MSigDB portal (gsea-msigdb.org), search for the gene set name (e.g., "MYC_UP.V1_DN") to view the full description, member genes, and linked PubMed references. For MYC-related sets, key papers often trace back to experiments like those by Zeller et al. (2003) on MYC targets in lymphoblastoid cells.
  • When to prioritize: If your NES/FDR hits are borderline or if the pathway doesn't intuitively fit your hypothesis, dive into the refs. For strong signals like yours, it's more for storytelling (e.g., in a manuscript) than decision-making.
  • Caveat: C6 sets are "curated signatures" from diverse experiments, so they're not as standardized as Hallmark (H) sets—always cross-check with orthogonal data (e.g., qPCR, Western blots) if possible.

3. Decoding the Suffixes: MYC_UP.V1_DN and VEGF_A_UP.V1_DN

The naming in C6 follows a consistent convention for paired signatures from oncogenic perturbations. Here's the breakdown:

  • Base name (e.g., MYC_UP.V1): Refers to the core signature. "MYC_UP" indicates genes upregulated in response to MYC oncogene activation (V1 = version 1, from a specific microarray dataset).
  • Suffix (_UP or _DN): Specifies directionality relative to the perturbation:
    • _UP: Genes induced/upregulated by the oncogene (e.g., direct targets activated by MYC).
    • _DN: Genes repressed/downregulated by the oncogene (e.g., indirect targets suppressed by MYC).

Many C6 sets come in _UP/_DN pairs to capture both arms of regulation.

Interpretation of Your Positive NES Results

For a positive NES on a _DN set like yours:

  • It means the genes in the set (which are normally downregulated by the oncogene) are upregulated in your dataset.
  • Biologically, this suggests inhibition or antagonism of the oncogenic pathway: Your phenotype may be "reversing" the repression that MYC (or VEGF_A) would impose, potentially indicating pathway suppression rather than activation.

Specifics for your hits:

  • MYC_UP.V1_DN (NES +1.76): These are genes repressed by MYC overexpression (e.g., cell cycle inhibitors or apoptosis inducers that MYC normally shuts down). Upregulation here implies reduced MYC activity--your samples might show MYC pathway inhibition, perhaps due to a treatment, mutation, or context (e.g., tumor suppressor activation). Opposite of what you'd see with MYC amplification.
  • VEGF_A_UP.V1_DN (NES +1.85): Genes repressed by VEGF-A signaling (e.g., anti-angiogenic factors that VEGF normally suppresses to promote vessel growth). Upregulation suggests attenuated VEGF-A pathway, like reduced angiogenesis signaling--common in anti-VEGF therapies or hypoxic responses.

In contrast, a positive NES for the paired _UP set (e.g., MYC_UP.V1_UP) would indicate pathway activation.

Gene Set Perturbation Suffix Meaning Positive NES Interpretation
MYC_UP.V1_DN MYC overexpression Genes downregulated by MYC MYC repression reversed -> Pathway inhibited
VEGF_A_UP.V1_DN VEGF-A activation Genes downregulated by VEGF-A VEGF-A repression reversed -> Angiogenesis suppressed

References and Tips

  • MSigDB C6 Docs: Primary source--Oncogenic Signatures. Search your sets there for exact origins.
  • GSEA User Guide: Subramanian et al. (2005, PNAS) for NES/FDR details; Liberzon et al. (2015, PNAS) for MSigDB curation.
  • Pro Tip: Visualize with GSEA's Enrichment Plot or tools like fgsea/enrichplot in R to see leading-edge genes. Also, run the paired _UP sets to confirm directionality.
  • If your dataset is RNA-seq from cancer, consider integrating with tools like GSVA for score-based validation.

Hope this clarifies things.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 5462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6