boxplot using ggplot2 in R
5 months ago
raavi21198

Hello members!!

I have a raw data which consists of 45 samples and their intensities. This is a microarray data expression. I have comverted this into a dataframe. However, I am confused how to plot a boxplot of all these 45 samples and also group them as "normal" and "tumor". Please help me out with this The code i used is as follows

read_data <- ReadAffy() ##read the raw .CEL files

ph$sample ph@data ph@data[,1]=c("NB","ND","TB","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB","TC","NB","ND","TB") sampleNames=vector() logs=vector() for (i in 1:45) { sampleNames=c(sampleNames,rep(ph@data[i,1],dim(pmexp)[1])) logs=c(logs,log2(pmexp[,i])) } logdata <- data.frame(logint=logs,sampleName=sampleNames)  the structure of this dataframe is as follows  > str(logdata) 'data.frame': 11155455 obs. of 2 variables:$ logint    : num  8.79 9.74 11.09 12.38 12.36 ...
\$ sampleName: chr  "NB" "NB" "NB" "NB" ...
1  8.791163         NB
2  9.736402         NB
3 11.091435         NB
4 12.376125         NB
5 12.363587         NB
6 11.574594         NB
> p+geom_boxplot()


Can someone please guide me how to create a boxplot using ggplot2 in R, of these 45 samples, by grouping them as normal and tumor samples, as the above code gives me the boxplot of only four samples. I need to print them all together

Thank you

in the absence of data, i suggest following:

1. Convert the data frame from wide format to long format. (dplyr/tidyr)
2. Attach grouping information for each sample (dplyr)
3. Draw box plot (ggplot)
5. Facet by group (ggplot)

Instead of boxplot, consider using violin plot with jitter.

Thank you for your response. I have edited to repost the data. Could you now let me know where am i going wrong

5 months ago

here is an example i built from https://bioconductor.org/packages/devel/workflows/vignettes/arrays/inst/doc/arrays.html

library(affy)   # Affymetrix pre-processing
library(limma)  # two-color pre-processing; differential
celfiles <- system.file("extdata", package="arrays")
eset <- justRMA(phenoData=phenoData,celfile.path=celfiles)
df=as.data.frame(exprs(eset))
pdata=pData(eset)

library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)

df %>%
pivot_longer(everything(),names_to = "cels", values_to ="vals") %>%
inner_join(., rownames_to_column(pdata),by = c("cels" = "rowname")) %>%
ggplot(., aes(cels,vals, fill=Sensitivity)) +
geom_boxplot()+
facet_wrap(~IVT, scales = "free")+
xlab("")+
ylab("")+
theme_bw()+
theme(axis.text.x = element_text(angle = 90),
axis.text = element_text(size=18),
strip.text = element_text(size=18),
legend.text = element_text(size=18),
legend.title = element_text(size = 18)
)


Code suggestions:

a) use theme_set() to both define a theme and set a base size for all relevant parts (axis, theme, labels) in a single command, that saves the multiple arguments in theme().

b) rotate x-axis labels with guides rather than angle as guides ensures proper alignment in horizontal and vertical directions even using angles such as 45°, see here, and

c) put legend on top so its large size does not shrink the plot itself. Again, the sizes of all fonts and labels are auto-adjusted to look decent based on the base_size in the theme_set() command on top.

theme_set(theme_bw(base_size = 15))
df %>%
pivot_longer(everything(),names_to = "cels", values_to ="vals") %>%
inner_join(., rownames_to_column(pdata),by = c("cels" = "rowname")) %>%
ggplot(., aes(cels,vals, fill=Sensitivity)) +
geom_boxplot()+
facet_wrap(~IVT, scales = "free")+
xlab("")+
ylab("")+
guides(x = guide_axis(angle = 45))+
theme(legend.position="top")


By the way, the code example you use requires the arrays package to be installed to have access to their extdata, BiocManager::install("arrays").

Thank you a so much for proving this example