Adding statistics to a ggplot in R; ggpubr can't find column
0
0
Entering edit mode
2.0 years ago
alebaars_98 ▴ 10

Hello biostars,

Currently, I'm working on my master thesis, and I got some feedback on one of my plots with the suggestion to add the relation between my groups in the boxplot with ggpubr. I've been working with the tutorial on datanovia, but for some reason, the function won't find one of my columns.

I've made a subset of my data here:

TSS_TE_subset <- data.frame(
  genome = factor(c("A1","A1","A1","A1","A1","D1","D1","D1","D1","D1","JR2","JR2","JR2","JR2","JR2"), levels=c("A1","JR2","D1")),
  distance = c(3299, 2999, 2117, 4228, 2565, 3260, 2515, 578, 1893, 612, 1333, 771, 2093, 1886, 192))

I want to compare my data with a wilcoxon test, using JR2 as a sample:

TSS_TE_subset_stat <- wilcox_test(TSS_TE_subset, distance ~ genome, ref.group="JR2") %>% add_significance()

Which creates a table looking like this:

# A tibble: 2 x 9
  .y.      group1 group2    n1    n2 statistic     p p.adj p.adj.signif
  <chr>    <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <chr>
1 distance JR2    A1         5     5         0 0.008 0.016 * 
2 distance JR2    D1         5     5         9 0.548 0.548 ns

Which looks very much like it is supposed to look, as far as I know. However, when I use my table with data and this statistics table to create a boxplot with the P-values, it doesn't work:

TSS_TE_subset_boxplot <- ggplot(TSS_TE_subset, aes(x=genome, y=distance, fill=genome)) +
  geom_boxplot() +
  theme_classic() +
  labs(
    x="Verticillium genome",
    y="TSS-TE distance",
    title="A upregulated"
  ) +
  scale_fill_manual(values=genome_colors) +
  stat_pvalue_manual(TSS_TE_subset_stat, label = "{p.adj}", tip.length = 0.01, y.position=5000) +
  geom_jitter(width=0.4, height=0, shape=".", alpha=0.4)

> TSS_TE_subset_boxplot
Error in FUN(X[[i]], ...) : object 'genome' not found

Now I can see it says it cannot find something called "genome," which is a column in my dataframe. I know it's there:

> glimpse(TSS_TE_subset)
Rows: 15
Columns: 2
$ genome   <fct> A1, A1, A1, A1, A1, D1, D1, D1, D1, D1, JR2, JR2, JR2, JR2, JR2
$ distance <dbl> 3299, 2999, 2117, 4228, 2565, 3260, 2515, 578, 1893, 612, 1333, 771, 2093, 1886, 192

When looking a bit into it, I found some people grouping the data before doing the statistics, but that created a very similar error even before making the plot:

TSS_TE_subset_stat <- dplyr::group_by(TSS_TE_subset, genome) %>% wilcox_test(distance ~ genome, ref.group="JR2") %>% add_significance()
Error in `mutate()`:
! Problem while computing `data = map(.data$data, .f, ...)`.
Caused by error in `stop_subscript()`:
! Can't extract columns that don't exist.
x Column `genome` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.

Honestly, I'm at a loss here. There must be something wrong with my data, but I cannot figure out what. glimpse() and class() confirm that the dataframe and its columns are valid. Rstudio has no issues displaying the data. And when I generate a boxplot without the use of ggpubr, it works fine:

TSS_TE_subset_stat_2 <- kruskal.test(distance ~ genome, TSS_TE_subset)$p.value

TSS_TE_subset_boxplot_2 <- ggplot(TSS_TE_subset, aes(x=genome, y=distance, fill=genome)) +
  geom_boxplot() +
  theme_classic() +
  labs(
    x="Verticillium genome",
    y="TSS-TE distance",
    title="A upregulated",
    subtitle=paste0("Kruskal-Wallis P-value: ",as.character(signif(TSS_TE_subset_stat_2, digits=5)))
  ) +
  scale_fill_manual(values=genome_colors) +
  geom_jitter(width=0.4, height=0, shape=".", alpha=0.4)

Working plot, as you can see

But that doesn't display the individual relations to the reference, as it does a Kruskal-Wallis comparing the three, rather than two of them to the reference strain individually.

Does anyone know what is going wrong and how to fix it, or to add P-values to a boxplot in a similar way? Many thanks in advance.

Also, in case that may be the culprit:

  • R version 4.1.1
  • Rstudio version 1.4.1717
  • Windows 10
plotting ggplot R ggpubr statistics • 1.5k views
ADD COMMENT
0
Entering edit mode

Modified from here (http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/76-add-p-values-and-significance-levels-to-ggplots/):

library(ggplot2)
library(ggpubr)

TSS_TE_subset$genome=relevel(TSS_TE_subset$genome, "JR2")
my_comparisons <- list( c("JR2", "A1"), c("JR2", "D1") )

ggboxplot(TSS_TE_subset, x = "genome", y = "distance",
          color = "genome", palette = "jco")+ 
  stat_compare_means(comparisons = my_comparisons)+ 
  stat_compare_means(label.y = 50)

boxplot

ADD REPLY
0
Entering edit mode

That works great! I have adapted it into the figure that I want to create. Thanks for your help!

ADD REPLY

Login before adding your answer.

Traffic: 1897 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6