Hi there,
I have a dataframe of gene expression (one row per gene). I would like to only plot certain subsets of genes (e.g. by tissue type). I currently have these subsets defined a priori in another dataframe/list.
library(tidyverse)
library(ggplot2)
library(ggforce) # for paginated plotting
#a df of 'all' genes and their expression at two time points
gene_expression <- data.frame(
gene = as.character(c('a', 'a','a','b', 'b','b','c', 'c','c','d','d','d','e','e','e','f','f','f','g', 'g', 'g','h','h','h','i', 'i','i','j','j','j',
'a', 'a','a','b', 'b','b','c', 'c','c','d','d','d','e','e','e','f','f','f','g', 'g', 'g','h','h','h','i', 'i','i','j','j','j')),
timepoint = as.character(c('1', '1','1','1', '1','1','1', '1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1','1',
'2', '2','2', '2', '2','2','2', '2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2','2')),
expression = as.numeric(sample(20:90, size = 60)))
#a dataframe of subsets (tissues) that contains subsets together with the 'genes of interest' for each subset
interesting_genes <- data.frame(
tissue = as.character(c('heart', 'heart','heart', 'kidney','kidney', 'liver','liver','intestine', 'intestine', 'intestine')),
gene = as.character(c('a', 'b', 'c', 'a', 'b', 'f', 'g', 'a', 'f', 'g')))
So far, I can only 'manually plot' each subset individually through subsetting prior to plotting. However, I would prefer a loop or mapping to plot all subsets (here tissue types) one after the next.
The desired output would be 1 PDF file for each subset (i.e 1 PDF each for heart, kidney, liver, intestine) with the subset of genes relevant for the tissue type. It would also be great to have each of the PDF titles corresponding to the tissue type.
So far I have managed this below (which gives me only one subset at a time). I though that adding another loop could work, I just don't have the coding knowledge on how to achieve that. Any help would be appreciated.
heart_list <- as.vector(interesting_genes %>% filter(tissue == 'heart')%>% select('gene') %>% unlist()) #creates the heart subset
n_pages = 3 #numbers of pages to plot
pdf(file=paste('heart_genes.pdf'), width=10, height=7) #write to pdf
for (i in seq_len(n_pages)) {
print(ggplot(gene_expression %>% filter(gene %in% heart_list ), aes(x=timepoint, y=expression, group=interaction(timepoint,gene ), fill = timepoint)) + # for grouping on multiple columns I use "interaction"
geom_boxplot() +
geom_point(position=position_jitterdodge(),aes(group=interaction(timepoint, gene)))+
scale_x_discrete(limits=c("1", "2")) +
facet_wrap_paginate(~ gene, scales = "free", ncol = 2, nrow = 2, page = i)) # use facet wrap to plot pdfs of 2x2 panels, maybe there are better alternatives to facet_wrap_paginate?
}
Hi Atpoint,
thanks so much for your help, this is a really elegant approach.
Would it be possible to make PDFs where each panel contains a single gene, rather than all genes inside a single panel? In my original post I achieved this using facet_wrap_paginate from the ggforce package
e.g. if I have 8 genes for heart tissue and want 4 panels on each page, I need to plot 2 pages
I have added a screenshot below (left is your code with 3 genes per panel, right is my desired output with 3 panels, each with one gene).
The desired resulting PDFs may contain dozens of pages. The different PDFs may be of varying page number (because the number of genes is different for each tissue).
Apologies if this is not clear, please let me know if you have further questions and thanks again!