change order of violin plots in ggplot2
2
0
Entering edit mode
5 weeks ago
Matteo Ungaro ▴ 100

Hi there I'm facing the following problem: I need to custom-order my violin plots for my populations as follow: AFR, EUR, MENA, SAS, CEA, SIB, OCE and AME; however, for some reason, R doesn't accept my mutate(population_ID = factor(population_ID, levels=c("AFR", "EUR", "MENA", "SAS", "CEA", "SIB", "OCE", "AME"))) %>% command line which should do exactly so...

Below how the plot appears to be:

violin_snps

and the corresponding code:

library(grid)
library(ragg)
library(Cairo)
library(ggh4x)
library(readr)
library(dplyr)
library(readxl)
library(tibble)
library(scales)
library(ggpubr)
library(gtable)
library(ggplot2)
library(hrbrthemes)
library(reticulate)
library(colorspace)
library(introdataviz)

variants_dist <- read_excel("path/to/file.xlsm", 10)
df_var = variants_dist %>% group_by(population_ID) %>% summarise(num=n())

### PLOT THE DATA
variants_dist %>%
  left_join(df_var) %>%
  mutate(population_ID = factor(population_ID, levels=c("AFR", "EUR", "MENA", "SAS", "CEA", "SIB", "OCE", "AME"))) %>%
  mutate(pop_count = paste0(population_ID, "\n", "n=", num)) %>%
  ggplot(aes(x=pop_count, y=snps, fill=population_ID)) +
  geom_violin(position="dodge", trim=FALSE) +
  geom_boxplot(width=0.07, color="black", alpha=0.6) +
  scale_fill_manual(values=c(EUR="dodgerblue2", MENA="mediumvioletred", SIB="darkkhaki", CEA="firebrick2", AFR="olivedrab2", OCE="powderblue", SAS="darksalmon", AME="plum2")) +
  #scale_x_discrete(limits = c("AFR", "EUR", "MENA", "SAS", "CEA", "SIB", "OCE", "AME")) +
  theme_bw() +
  theme(
    legend.position="none",
  ) +
  xlab("")

In principle, it should be a trivial thing to do and I've done it before but in this case is not working. For reference, I also add a link to the post on SO if anyone finds it more helpful.

R ggplot2 violin-plots • 421 views
ADD COMMENT
3
Entering edit mode
5 weeks ago
Matteo Ungaro ▴ 100

For the complete code, if anyone is interested I used the one below:

variants_dist <- read_excel("/Users/matte/Documents/SGDP_download/Simons_Genome_Diversity_Project-M.xlsm", 10)

# defines order for populations in the main df
variants_dist <- variants_dist %>%
  mutate(population_ID=factor(population_ID, levels=c("AFR", "EUR", "MENA", "SAS", "CEA", "SIB", "OCE", "AME")))

#arrange the new df based on population_ID
variants_dist %>% arrange(population_ID) -> pop_sort

df_var = pop_sort %>% group_by(population_ID) %>% summarise(num=n())


### PLOT THE DATA
pop_sort %>%
  left_join(df_var) %>%
  mutate(pop_count = paste0(population_ID, "\n", "n=", num)) %>%
  ggplot(aes(x=forcats::fct_inorder(pop_count), y=snps, fill=population_ID)) + #I think forcats::fct_inorder(pop_count) enforces the order of the main df
  geom_violin(position="dodge", trim=FALSE) +
  geom_boxplot(width=0.07, color="black", alpha=0.6) +
  scale_fill_manual(values=c(EUR="dodgerblue2", MENA="mediumvioletred", SIB="darkkhaki", CEA="firebrick2", AFR="olivedrab2", OCE="powderblue", SAS="darksalmon", AME="plum2")) +
  theme_bw() +
  theme(
    legend.position="none",
  ) +
  xlab("")

and result:

violin_snps_2

ADD COMMENT
2
Entering edit mode
5 weeks ago
fracarb8 ★ 1.6k

Your x axis is not population_ID but pop_count which is created by mutate. That is the column that needs reordering.

    samples                  population_ID    snps  indels   num pop_count   
    <chr>                    <fct>           <dbl>   <dbl> <int> <chr>       
  1 abh100 - number of:      MENA          4847876 1815572    23 "MENA\nn=23"
  2 abh107 - number of:      MENA          4820146 1746517    23 "MENA\nn=23"
  3 ALB212 - number of:      EUR           4875942 1744015    52 "EUR\nn=52" 
  4 Ale14 - number of:       SIB           4848405 1748094    27 "SIB\nn=27"

I would suggest to separate the dataset manipulation and the ggplot call.

ADD COMMENT
0
Entering edit mode

fracarb8 how can I do so preserving the number of samples for each population for instance. I was trying to act outside the plotting with this:

order <- c("AFR", "EUR", "MENA", "SAS", "CEA", "SIB", "OCE", "AME")
df_var %>%
  arrange(match(population_ID, order)) -> df_var

which is, however, not seen by the ggplot graph. Something a person suggested on SO is to use the forcats::fct_inorder(pop_count) option which I'm not familiar with. Let me know, thanks!

ADD REPLY
0
Entering edit mode

fracarb8 thanks for the interest, in the end following the advice given on SO I managed to do so in a clean way. I will still accept the answer because relevant and pointed out in the right direction!

ADD REPLY

Login before adding your answer.

Traffic: 1493 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6