How to make box plot for different gene types
16 months ago
I wish to visualize the differences in expression between "gene types" (LncRNAs and mRNAs) through box plot. I prepared the dataset with normalised read counts for differentially expressed genes as Gene control1 control2 treatment1 treatment2 treatment3 treatment4 type. Where type specifies the Lncrna and mrna.

I saw tutorials to plot based on sample and treatment conditions. But I wish to plot based on gene_type Can anyone post a tutorial link on how to do it. Thank you.

In R using gather and ggplot2 functions

library(tidyverse)
# dummy data
dat <- data.frame(Gene=letters[1:20],control1=rnorm(n=20,10,5),control2=rnorm(n=20,10,5),treatment1=rnorm(n=20,10,5),treatment2=rnorm(n=20,10,5),treatment3=rnorm(n=20,10,5),type=c(rep("mrna",15),rep("lncrna",5)))

# take a look at the data
Gene  control1  control2  treatment1 treatment2   treatment3 type
1    a 13.451876 -1.034844  2.09249293  12.403769     8.142288 mrna
2    b 11.410395 12.781956  7.27690348  19.894176    11.799505 mrna
3    c 13.710790  8.326707 -0.05650038  20.987526     9.430903 mrna
4    d 10.077806  9.965193 10.76178585   4.128015     9.708007 mrna
5    e  3.350691 12.718572  7.06102222  10.279997     5.590861 mrna
6    f  7.102124  6.408744  9.61125820   7.454496    15.636222 mrna

# plot using ggplot2
dat %>%
gather(-Gene,-type,key="sample",value="expr") %>% # format to a long format
ggplot(aes(x=type,y=expr)) +
geom_boxplot()


Thank you very much. I will try this

How to add median values to the plot..? Should I create a new object to use in geom_text?..

I found this example but could not replicate it.

dataMedian <- summarise(group_by(dataInput, key), MD = median(value))
ggplot(dataInput, aes(key, value)) +
geom_boxplot() +
geom_text(data = dataMedian, aes(key, MD, label = MD),
position = position_dodge(width = 0.8), size = 3, vjust = -0.5)