Question

change order of violin plots in ggplot2

0

Entering edit mode

5 weeks ago

Matteo Ungaro ▴ 100

Hi there I'm facing the following problem: I need to custom-order my violin plots for my populations as follow: AFR, EUR, MENA, SAS, CEA, SIB, OCE and AME; however, for some reason, R doesn't accept my mutate(population_ID = factor(population_ID, levels=c("AFR", "EUR", "MENA", "SAS", "CEA", "SIB", "OCE", "AME"))) %>% command line which should do exactly so...

Below how the plot appears to be:

violin_snps

and the corresponding code:

library(grid)
library(ragg)
library(Cairo)
library(ggh4x)
library(readr)
library(dplyr)
library(readxl)
library(tibble)
library(scales)
library(ggpubr)
library(gtable)
library(ggplot2)
library(hrbrthemes)
library(reticulate)
library(colorspace)
library(introdataviz)

variants_dist <- read_excel("path/to/file.xlsm", 10)
df_var = variants_dist %>% group_by(population_ID) %>% summarise(num=n())

### PLOT THE DATA
variants_dist %>%
  left_join(df_var) %>%
  mutate(population_ID = factor(population_ID, levels=c("AFR", "EUR", "MENA", "SAS", "CEA", "SIB", "OCE", "AME"))) %>%
  mutate(pop_count = paste0(population_ID, "\n", "n=", num)) %>%
  ggplot(aes(x=pop_count, y=snps, fill=population_ID)) +
  geom_violin(position="dodge", trim=FALSE) +
  geom_boxplot(width=0.07, color="black", alpha=0.6) +
  scale_fill_manual(values=c(EUR="dodgerblue2", MENA="mediumvioletred", SIB="darkkhaki", CEA="firebrick2", AFR="olivedrab2", OCE="powderblue", SAS="darksalmon", AME="plum2")) +
  #scale_x_discrete(limits = c("AFR", "EUR", "MENA", "SAS", "CEA", "SIB", "OCE", "AME")) +
  theme_bw() +
  theme(
    legend.position="none",
  ) +
  xlab("")

In principle, it should be a trivial thing to do and I've done it before but in this case is not working. For reference, I also add a link to the post on SO if anyone finds it more helpful.

R ggplot2 violin-plots • 421 views

ADD COMMENT • link updated 5 weeks ago by Ram 43k • written 5 weeks ago by Matteo Ungaro ▴ 100

Ram · Accepted Answer · 2024-03-19

For the complete code, if anyone is interested I used the one below:

variants_dist <- read_excel("/Users/matte/Documents/SGDP_download/Simons_Genome_Diversity_Project-M.xlsm", 10)

# defines order for populations in the main df
variants_dist <- variants_dist %>%
  mutate(population_ID=factor(population_ID, levels=c("AFR", "EUR", "MENA", "SAS", "CEA", "SIB", "OCE", "AME")))

#arrange the new df based on population_ID
variants_dist %>% arrange(population_ID) -> pop_sort

df_var = pop_sort %>% group_by(population_ID) %>% summarise(num=n())


### PLOT THE DATA
pop_sort %>%
  left_join(df_var) %>%
  mutate(pop_count = paste0(population_ID, "\n", "n=", num)) %>%
  ggplot(aes(x=forcats::fct_inorder(pop_count), y=snps, fill=population_ID)) + #I think forcats::fct_inorder(pop_count) enforces the order of the main df
  geom_violin(position="dodge", trim=FALSE) +
  geom_boxplot(width=0.07, color="black", alpha=0.6) +
  scale_fill_manual(values=c(EUR="dodgerblue2", MENA="mediumvioletred", SIB="darkkhaki", CEA="firebrick2", AFR="olivedrab2", OCE="powderblue", SAS="darksalmon", AME="plum2")) +
  theme_bw() +
  theme(
    legend.position="none",
  ) +
  xlab("")

and result:

violin_snps_2

Ram · Accepted Answer · 2024-03-19

2

Entering edit mode

5 weeks ago

fracarb8 ★ 1.6k

Your x axis is not population_ID but pop_count which is created by mutate. That is the column that needs reordering.

    samples                  population_ID    snps  indels   num pop_count   
    <chr>                    <fct>           <dbl>   <dbl> <int> <chr>       
  1 abh100 - number of:      MENA          4847876 1815572    23 "MENA\nn=23"
  2 abh107 - number of:      MENA          4820146 1746517    23 "MENA\nn=23"
  3 ALB212 - number of:      EUR           4875942 1744015    52 "EUR\nn=52" 
  4 Ale14 - number of:       SIB           4848405 1748094    27 "SIB\nn=27"

I would suggest to separate the dataset manipulation and the ggplot call.

ADD COMMENT • link updated 5 weeks ago by Ram 43k • written 5 weeks ago by fracarb8 ★ 1.6k

0

Entering edit mode

fracarb8 how can I do so preserving the number of samples for each population for instance. I was trying to act outside the plotting with this:

order <- c("AFR", "EUR", "MENA", "SAS", "CEA", "SIB", "OCE", "AME")
df_var %>%
  arrange(match(population_ID, order)) -> df_var

which is, however, not seen by the ggplot graph. Something a person suggested on SO is to use the forcats::fct_inorder(pop_count) option which I'm not familiar with. Let me know, thanks!

ADD REPLY • link 5 weeks ago by Matteo Ungaro ▴ 100

0

Entering edit mode

fracarb8 thanks for the interest, in the end following the advice given on SO I managed to do so in a clean way. I will still accept the answer because relevant and pointed out in the right direction!

ADD REPLY • link 5 weeks ago by Matteo Ungaro ▴ 100