R ggplot2 creating separate plots for different subsets of rows
1
0
Entering edit mode
3.0 years ago

Hi there,

I have a dataframe of gene expression (one row per gene). I would like to only plot certain subsets of genes (e.g. by tissue type). I currently have these subsets defined a priori in another dataframe/list.

library(tidyverse)
library(ggplot2)
library(ggforce) # for paginated plotting

#a df of 'all' genes and their expression at two time points
gene_expression <- data.frame(
  gene = as.character(c('a', 'a','a','b', 'b','b','c', 'c','c','d','d','d','e','e','e','f','f','f','g', 'g', 'g','h','h','h','i', 'i','i','j','j','j',
                    'a', 'a','a','b', 'b','b','c', 'c','c','d','d','d','e','e','e','f','f','f','g', 'g', 'g','h','h','h','i', 'i','i','j','j','j')),
  timepoint = as.character(c('1', '1','1','1', '1','1','1', '1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1',
                    '2', '2','2', '2', '2','2','2', '2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2')),
  expression = as.numeric(sample(20:90, size = 60)))

#a dataframe of subsets (tissues) that contains subsets together with the 'genes of interest' for each subset
interesting_genes <- data.frame(
  tissue = as.character(c('heart', 'heart','heart', 'kidney','kidney', 'liver','liver','intestine', 'intestine', 'intestine')),
  gene = as.character(c('a', 'b', 'c', 'a', 'b', 'f', 'g', 'a', 'f', 'g')))

So far, I can only 'manually plot' each subset individually through subsetting prior to plotting. However, I would prefer a loop or mapping to plot all subsets (here tissue types) one after the next.

The desired output would be 1 PDF file for each subset (i.e 1 PDF each for heart, kidney, liver, intestine) with the subset of genes relevant for the tissue type. It would also be great to have each of the PDF titles corresponding to the tissue type.

So far I have managed this below (which gives me only one subset at a time). I though that adding another loop could work, I just don't have the coding knowledge on how to achieve that. Any help would be appreciated.

heart_list <- as.vector(interesting_genes %>% filter(tissue == 'heart')%>% select('gene') %>% unlist()) #creates the heart subset
n_pages = 3 #numbers of pages to plot

pdf(file=paste('heart_genes.pdf'), width=10, height=7) #write to pdf
for (i in seq_len(n_pages)) {
  print(ggplot(gene_expression %>% filter(gene %in% heart_list ), aes(x=timepoint, y=expression, group=interaction(timepoint,gene ), fill = timepoint)) + # for grouping on multiple columns I use "interaction" 
      geom_boxplot() +
      geom_point(position=position_jitterdodge(),aes(group=interaction(timepoint, gene)))+
      scale_x_discrete(limits=c("1", "2")) +
      facet_wrap_paginate(~ gene, scales = "free", ncol = 2, nrow = 2, page = i)) # use facet wrap to plot pdfs of 2x2 panels, maybe there are better alternatives to facet_wrap_paginate?

}

r facet-wrap cowplot for-loop ggplot2 • 1.2k views
ADD COMMENT
0
Entering edit mode
3.0 years ago
ATpoint 81k

I am not sure I fully understand but this solution would give you a named list with each entry being one plot object recycling the plotting code you used, and it also saves every plot to disk with the file name containing the tissue, see whether this makes sense to you. I usually avoid for-loops as this spams the environment with unnecessary variables. sapply and company are more convenient.

plots_byTissue <- 
sapply(unique(interesting_genes$tissue), function(tis){

  tmp.gene <- 
  interesting_genes %>% 
    filter(tissue==tis) %>%
    pull(gene)

  plotty <- 
  gene_expression %>%
    filter(gene %in% tmp.gene) %>%
    ggplot(aes(x=timepoint, y=expression, group=interaction(timepoint,gene), fill = timepoint)) +
    geom_boxplot() +
    geom_point(position=position_jitterdodge(),aes(group=interaction(timepoint, gene)))

  pdf(paste0("tissue_", tis, ".pdf"))
  print(plotty); dev.off()

  return(plotty)

}, simplify = FALSE)
ADD COMMENT
0
Entering edit mode

Hi Atpoint,

thanks so much for your help, this is a really elegant approach.

Would it be possible to make PDFs where each panel contains a single gene, rather than all genes inside a single panel? In my original post I achieved this using facet_wrap_paginate from the ggforce package

e.g. if I have 8 genes for heart tissue and want 4 panels on each page, I need to plot 2 pages

n_pages = round(8/4,0) #page number required if you have 8 genes for a given tissue and want 4 panels on each page, you need to plot 2 pages) 
facet_wrap_paginate(~ gene, scales = "free", ncol = 2, nrow = 2, page = i)) # use facet wrap to plot 2 pdfs with pages containing 2x2 = 4 panels

I have added a screenshot below (left is your code with 3 genes per panel, right is my desired output with 3 panels, each with one gene).

The desired resulting PDFs may contain dozens of pages. The different PDFs may be of varying page number (because the number of genes is different for each tissue).

Apologies if this is not clear, please let me know if you have further questions and thanks again! left your code, right my desired output

ADD REPLY

Login before adding your answer.

Traffic: 1826 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6