How to add gene lable on GSEA plot
0
2
Entering edit mode
13 days ago
Emy Alade ▴ 20

Hello, I performed a Gene Set Enrichment Analysis (GSEA) using the clusterProfiler package in R on a ranked gene list. I want to display the genes names directly on the enrichment plot (like in the example picture below). enter image description here

library(readr)
library(ggplot2)
library(dplyr)
library(clusterProfiler)
library(biomaRt)
library(org.Hs.eg.db)
library(patchwork)
library(msigdbr)
library(enrichplot)

# Perform GSEA using gseGO function on the ranked gene list
gseGO_result <- gseGO(
                geneList = gene_list_unique,
                ont = "BP",                  # Use Biological Process ontology
                OrgDb = org.Hs.eg.db,        # Human gene annotation database
                keyType = "ENTREZID",        # Key type for gene IDs
                pvalueCutoff = 0.05,         # p-value cutoff for significant terms
                pAdjustMethod = "BH",        # Multiple testing correction method (Benjamini-Hochberg)
                minGSSize = 10,              # Minimum gene set size
                maxGSSize = 500,             # Maximum gene set size
                verbose = TRUE,
                seed = TRUE                  # Set seed for reproducibility
              )

# Simplify GSEA results by removing redundant GO terms based on similarity cutoff
gseGO_result2 <- clusterProfiler::simplify(
                  gseGO_result,
                  cutoff = 0.7,               # Similarity cutoff to merge redundant terms
                  by = "p.adjust",            # Use adjusted p-value for grouping
                  select_fun = min            # Select term with minimal adjusted p-value
                )

# Convert ENTREZ IDs to gene symbols for easier interpretation
gene_df <- bitr(
          names(gene_list_unique),
          fromType = "ENTREZID",
          toType = "SYMBOL",
          OrgDb = org.Hs.eg.db
        )

# Add gene symbols to the core enrichment column of the gseGO results
gseGO_result2@result$core_enrichment_symbol <- sapply(
  gseGO_result2@result$core_enrichment,
  function(x) {
    entrez_ids <- unlist(strsplit(x, "/"))
    symbols <- gene_df$SYMBOL[match(entrez_ids, gene_df$ENTREZID)]
    paste(symbols, collapse = "/")
  }
)


plot <- gseaplot2(gseGO_result2,
                  geneSetID = "GO:0071824",
                  pvalue_table = TRUE,
                  color =  "#86B875",
                 ES_geom = "line")

# Extraire les gènes "leading edge"
genes <- gseGO_result2@result[gseGO_result2@result$Description == "protein-DNA complex assembly", "core_enrichment"]
genes <- unlist(strsplit(genes, "/"))
deg_in_list <- sigDEGs_ids[sigDEGs_ids$ENTREZID %in% genes, ]
deg_in_list$rank <- match(deg_in_list$ENTREZID, names(gene_list_unique))
label_df <- na.omit(deg_in_list)
p3 <- plot[[3]]

# Préparer le dataframe pour les petites barres bleues à y=0
bars_df <- data.frame(
  x = label_df$rank,
  y = 0,
  xend = label_df$rank,
  yend = gene_list_unique[label_df$rank]
)

# Construire le plot
plot[[3]] <- p3 +  
  # Barres grises de tous les gènes
  geom_segment(data = data.frame(
                 x = seq_along(gene_list_unique),
                 y = 0,
                 xend = seq_along(gene_list_unique),
                 yend = gene_list_unique),
               aes(x = x, y = y, xend = xend, yend = yend),
               color = "grey70") +
  # Ligne horizontale à y=0
  geom_hline(yintercept = 0, linetype = "dashed") +
  # Petites barres bleues sur la baseline pour les gènes annotés
  geom_segment(data = bars_df,
               aes(x = x, y = y, xend = xend, yend = yend),
               color = "blue",
               size = 0.5) +
  # Texte annoté avec flèches partant de y=0 vers le texte au-dessus
  geom_text_repel(data = label_df,
                  aes(x = rank, y = 0, label = SYMBOL),
                  nudge_y = diff(range(gene_list_unique)) * 0.07,  # ajuster la hauteur du texte
                  angle = 90,
                  size = 3.5,
                  segment.size = 0.3,
                  segment.curvature = 0,
                  segment.ncp = 10,
                  segment.angle = 20) +
  labs(x = "Rank in ordered dataset", y = "Ranked list metric") +
  theme_minimal()

# Affichage avec patchwork, augmenter la hauteur du 3e plot si besoin
plot[[1]] / plot[[2]] / plot[[3]] + plot_layout(heights = c(3, 1, 3))

enter image description here

gseaplot R • 487 views
ADD COMMENT
0
Entering edit mode

Hi Emy, it looks like from your worked example, you've managed to label some genes. What specifically would you like help with?

ADD REPLY
0
Entering edit mode

I want the genes to be clearly readable and not overlapping (as in the first image). Normally, five genes should be displayed

ADD REPLY
0
Entering edit mode

You're more or less there...

Take a look at the ggrepel example below:
https://ggrepel.slowkow.com/articles/examples#limit-labels-to-a-specific-area

What I would recommend is that you select your genes of interest and feed them in via label_df then use a combination of geom_text_repel(xlim = ..., ylim = ..., nudge_x = ..., nudge_y = ...) to correctly define the regions that ggrepel is able to use. You may need to expand the y-axis limits for the plot go provide more space for the labels if you plan to use genes with very long names.

As an aside, I assume you want to code this rather than fix it in Illustrator because you plan to be generating many of these plots in a batch. I would still say there is an upper limit to how many of these plots are useful to show in this way. If you are going to be showing more than 3 or 4 of these, then maybe you could consider an alternative visualisation?

If you're able to share gene_list_unique, bars_df, label_df and gseGO_result2, or a dummy example of each I could help you with some code.

ADD REPLY

Login before adding your answer.

Traffic: 1843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6