Question

Using ggplot2 to make barplots of RNASeq data - maintaining sample metadata when pivoting from wide to long format

0

Entering edit mode

5 months ago

Dylan C-C • 0

I am currently trying to replicate the following plots of my RNASeq data made by the program Biolayout using ggplot2. This is a network analysis tool which clusters together genes which follow similar patterns of expression across your samples. This plot is showing the average TPM of all of the genes listed on the right side for all my samples, and the colours above the sample names are different genotype/tissue groupings. I want to be able to recreate this using ggplot2 so that I can have more control of the look of the plot, as well as the grouping of the samples. Biolayout Plot

My problem is that I am having difficulty in pivoting the data from a wide to long format needed for ggplot2, while maintaining the important metadata information which is needed for the grouping and colouring of the graph. The following image is an example of what the data looks like. The first 4 rows are metadata about the samples (tissue type, different genotypes) that I want to use in ggplot for the grouping. However when you are using pivot_longer to meld the data into something useable by ggplot2, you need just a plain matrix of the gene names and counts. So I am wondering how I can use this metadata down the track when making a ggplot2 plot to be able to order the samples. Is it possible to make a separate metadata dataframe with the extra information and the linking sample names, and then pull from that when calling the aesthetics of the ggplot.

enter image description here

rnaseq pivot_longer ggplot2 • 1.0k views

ADD COMMENT • link updated 5 months ago by cmdcolin ★ 3.8k • written 5 months ago by Dylan C-C • 0

1

Entering edit mode

Your intuition was correct. You want to make two separate data frames and join them on sample name. This code is untested but will probably (hopefully) work.

library("tidyverse")

df_meta <- df |>
  slice(1:4) |>
  select(!c(gene_name, description)) |>
  rename(meta_type=unique_gene_id) |>
  pivot_longer(!meta_type, names_to="sample_name", values_to="meta_val")

df_counts <- df |>
  slice(5:n()) |>
  pivot_longer(!1:3, names_to="sample_name", values_to="count")

df_merged <- inner_join(df_counts, df_meta, by="sample_name")

If you want to reproduce the figure exactly, including the colored grouping bars, it would actually be a lot more straight forward in CompelxHeatmap.