remove the letter a from a biplot in ggplot
2
0
Entering edit mode
12 weeks ago
Assa Yeroslaviz ★ 1.9k

I'm trying to create a biplot in ggplot2 and manage to do it after a lot of trail and errors, But I can't get rid of the a from the legend. I think the difficulty arrise from the fact, that the data set is not set globally, but inside the geom_segment() command. But I can't find a way around it. The code is below as well as the output.

In the legend I get the a with the colours. I would like to know how I can get rid of the a and how I can set specific colours to this groups.

thanks in advance

library(ggplot2)

data <- matrix(rnorm(100), nrow = 10, ncol = 6)
colnames(data) <- c("gene1", "gene2", "gene3", "gene4", "gene5", "gene6")

pca_result <- prcomp(data, center = TRUE, scale. = TRUE)

scores <- as.data.frame(pca_result$x)
scores$sample <- rownames(scores)

loadings <- as.data.frame(pca_result$rotation)
loadings$variable <- rownames(loadings)
loadings$group <- rep(c("group1", "group2", "group3"), each = 2)

explained_variance <- summary(pca_result)$importance[2, ]
percent_var_PC1 <- round(explained_variance[1] * 100, 1)
percent_var_PC2 <- round(explained_variance[2] * 100, 1)

ggplot() +
  # Plot the scores (samples)
  geom_point(data = scores, aes(x = PC1, y = PC2), color = "#0072B2", size = 3) +
  geom_text_repel(data = scores, aes(x = PC1, y = PC2, label = sample), color = "#0072B2", size =2) +

  # Plot the loadings (variables) as arrows
  geom_segment(data = loadings, aes(x = 0, y = 0, xend = PC1*5, yend = PC2*5), 
               arrow = arrow(length = unit(0.3, "cm")), color = "grey") +
  geom_text_repel(data = loadings, aes(x = PC1*5, y = PC2*5, label = variable, color = group), size = 3) +
  # Add axis labels and title
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        panel.background = element_blank(), axis.line = element_line(colour = "black")) +
  labs(title = "test PCA", 
       x = paste0("PC1 (", percent_var_PC1, "%)"), 
       y = paste0("PC2 (", percent_var_PC2, "%)")
  )

PCA plot

legend biplot ggplot guides • 447 views
ADD COMMENT
3
Entering edit mode
12 weeks ago
DGTool ▴ 290

In recent ggplot2 versions, geom_text (and other related function) had the key_glyph parameter which can change what should be displayed in the legend (i.e. so to change it from the a to just the coloured point it would be key_glyph="point"). I don't know if geom_text_repel or geom_segment would accept a similar parameter, but might be something to explore and try out. (From the ggplot2 reference: https://ggplot2.tidyverse.org/reference/draw_key.html)

ADD COMMENT
1
Entering edit mode

thanks you, and yes it does accept it. Just changed the line to geom_text_repel(data = loadings, aes(x = PC1*5, y = PC2*5, label = variable, color = group), size = 3, key_glyph = "point") + and it worked.

ADD REPLY
0
Entering edit mode
12 weeks ago
kalavattam ▴ 280

The a in the legend is due to how ggrepel::geom_text_repel() handles the color aesthetic for text labels. You are mapping the color aesthetic to the group in the loadings data, so it automatically creates a legend for this aesthetic. The a appears because the legend represents the text, not the points or arrows.

We can adjust this by doing the suppressing the legend for ggrepel::geom_text_repel().

...and how I can set specific colours to this groups.

We can assign colors to the groups using scale_color_manual().

Finally, if you want a legend that displays point glyphs instead of text (a), you can add a geom_point() layer specifically for the loadings and map the color aesthetic to the group.

I've updated your code example with the above in mind:

#!/usr/bin/env Rscript

library(ggplot2)
library(ggrepel)

set.seed(24)  # Set seed for reproducibility

data <- matrix(rnorm(100), nrow = 10, ncol = 6)
colnames(data) <- c("gene1", "gene2", "gene3", "gene4", "gene5", "gene6")

pca_result <- prcomp(data, center = TRUE, scale. = TRUE)

scores <- as.data.frame(pca_result$x)
scores$sample <- rownames(scores)

loadings <- as.data.frame(pca_result$rotation)
loadings$variable <- rownames(loadings)
loadings$group <- rep(c("group1", "group2", "group3"), each = 2)

explained_variance <- summary(pca_result)$importance[2, ]
percent_var_PC1 <- round(explained_variance[1] * 100, 1)
percent_var_PC2 <- round(explained_variance[2] * 100, 1)

p <- ggplot() +
    #  Plot the scores (samples)
    geom_point(
        data = scores,
        aes(x = PC1, y = PC2),
        color = "#0072B2",
        size = 3
    ) +
    ggrepel::geom_text_repel(
        data = scores,
        aes(x = PC1, y = PC2, label = sample),
        color = "#0072B2",
        size = 2
    ) +

    #  Plot the loadings (variables) as arrows
    geom_segment(
        data = loadings, aes(x = 0, y = 0, xend = PC1*5, yend = PC2*5), 
        arrow = arrow(length = unit(0.3, "cm")),
        color = "grey"
    ) +
    ggrepel::geom_text_repel(
        data = loadings,
        aes(x = PC1 * 5, y = PC2 * 5, label = variable, color = group),
        size = 3,
        show.legend = FALSE  # Suppress the legend for text labels
    ) +

    #  Add points for the loadings with color mapped to group for legend
    geom_point(
        data = loadings,
        aes(x = PC1 * 5, y = PC2 * 5, color = group),
        size = 3
    ) +

    #  Add axis labels and title
    theme(
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.line = element_line(colour = "black")
    ) +
    labs(
        title = "test PCA", 
        x = paste0("PC1 (", percent_var_PC1, "%)"), 
        y = paste0("PC2 (", percent_var_PC2, "%)")
    ) +

    #  Manually set colors for the groups; change the below colors to
    #+ whatever you want
    scale_color_manual(
        values = c("group1" = "red", "group2" = "blue", "group3" = "green")
    )

p
ADD COMMENT

Login before adding your answer.

Traffic: 5724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6