Question: Boxplot in ggplot2
0
gravatar for krushnach80
9 months ago by
krushnach80320
krushnach80320 wrote:

Why is it so difficult to make things in ggplot2 , i like the way it helps in customisation but the curve is steep nevertheless

Here is my sample dataframe

df <-           gene        HSC       CMP
       ENSG00000158292.6  1.8102636  2.456869
       ENSG00000162496.6  2.6796705  6.203838
       ENSG00000117115.10  3.4509115  5.555739
       ENSG00000159423.14  3.6809277  5.063446
       ENSG00000053372.4  5.7089974  6.851090

If i have plot a boxplot i can simply write this boxplot(df[,-1],col=c("red","blue"))

I get a boxplot but when im trying with ggplot2 im having difficult time

ex <- melt(df, id.vars=c("HSC", "CMP"))
ggplot(data = ex,
       aes(x = CMP, y = HSC)) +
  geom_boxplot()

I get a single boxplot what i want is i get a box plot for HSC and CMP which i got when i use simple base R boxplot .

Any help or suggestion would be highly appreciated with my ggplot2 code

R • 1.5k views
ADD COMMENTlink modified 9 months ago • written 9 months ago by krushnach80320

Thank you for such cool neat code ...

ADD REPLYlink written 9 months ago by krushnach80320
5
gravatar for Devon Ryan
9 months ago by
Devon Ryan82k
Freiburg, Germany
Devon Ryan82k wrote:
ex = melt(df, id.vars="gene")
ggplot(ex, aes(x=variable, y=value)) + geom_boxplot()

Your melt() command produced nonsensical output.

ADD COMMENTlink written 9 months ago by Devon Ryan82k
1

This is good but I would use gather from tidyr. The package tidyr is the evolution of reshape2, and it contains more functions to massage data and reshape it for ggplot2/tidyverse.

ADD REPLYlink written 9 months ago by Giovanni M Dall'Olio26k
1

Agreed and that's what I teach our students, but I don't want to complicate things when answering a simple "why does X not work" question :)

ADD REPLYlink written 9 months ago by Devon Ryan82k

okay let me do this ...

ADD REPLYlink written 9 months ago by krushnach80320

Thank you very much

ADD REPLYlink written 9 months ago by krushnach80320
5
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe25k
USA / Europe / Brazil
Kevin Blighe25k wrote:

Devon got there before me but as he mentioned the id.vars needs to be set to 'gene'

Here's a boxplot with scatterplot overlay for anyone else arriving here from Google.

I do agree that ggplot can be difficult to work with. Many functions redundant in the sense that they do the same thing as other but have different names, and conflicts frequently arise. That said, if you can master ggplot, then you can produce very nice graphics for publications.

require(reshape2)
require(ggplot2)

ex <- melt(df, id.vars=c("gene"))
colnames(ex) <- c("gene","group","exprs")

ggplot(data=ex, aes(x=group, y=exprs)) +

    geom_boxplot(position=position_dodge(width=0.5), outlier.shape=17, outlier.colour="red", outlier.size=0.1, aes(fill=group)) +

    #Choose which colours to use; otherwise, ggplot2 choose automatically
    #scale_color_manual(values=c("red3", "white", "blue")) + #for scatter plot dots
    scale_fill_manual(values=c("red", "royalblue")) + #for boxplot

    #Add the scatter points (treats outliers same as 'inliers')
    geom_jitter(position=position_jitter(width=0.3), size=3.0, colour="black") +

    #Set the size of the plotting window
    theme_bw(base_size=24) +

    #Modify various aspects of the plot text and legend
    theme(
        legend.position="none",
        legend.background=element_rect(),
        plot.title=element_text(angle=0, size=14, face="bold", vjust=1),

        axis.text.x=element_text(angle=45, size=14, face="bold", hjust=1.10),
        axis.text.y=element_text(angle=0, size=14, face="bold", vjust=0.5),
        axis.title=element_text(size=14, face="bold"),

        #Legend
        legend.key=element_blank(),     #removes the border
        legend.key.size=unit(1, "cm"),      #Sets overall area/size of the legend
        legend.text=element_text(size=12),  #Text size
        title=element_text(size=12)) +      #Title text size

    #Change the size of the icons/symbols in the legend
    guides(colour=guide_legend(override.aes=list(size=2.5))) +

    #Set x- and y-axes labels
    xlab("Stem cell class") +
    ylab("Expression") +

    #ylim(0, 0) +

    ggtitle("My plot")

boxscatter

ADD COMMENTlink written 9 months ago by Kevin Blighe25k
1

That's nice, but a violin plot would be better ;-)

ADD REPLYlink written 9 months ago by WouterDeCoster31k

Coincidentally, I just produced a violin plot for other data ;)

ggplot(violinMatrix, aes(x=Sample, y=Expression)) + geom_violin() + theme(axis.text.x = element_text(angle=45, hjust=1))

lol

ADD REPLYlink written 9 months ago by Kevin Blighe25k

Thank both of you.i been breaking my head over it ..

ADD REPLYlink written 9 months ago by krushnach80320
1

Don't worry. I did the same a few years ago trying to work with ggplot.

ADD REPLYlink written 9 months ago by Kevin Blighe25k

Im using your code to make boxplots for normalised vs as the data that is not normalised ,what i have to do not to fill those box with data points or dots i tried to remove "aes(fill=group)" still i dont get it i see my hoxplot but it looks filled up with dotpoints..any suggestion ?

ADD REPLYlink written 9 months ago by krushnach80320
1

Hello my friend. If you do not want the scatterplot overlayed onto the boxplot, just comment out:

geom_jitter(position=position_jitter(width=0.3), size=3.0, colour="black") +
ADD REPLYlink written 9 months ago by Kevin Blighe25k

thank you i kind of figured it out after playing it around but anyways thank you for your prompt response

ADD REPLYlink written 9 months ago by krushnach80320

No problem, good luck with it.

ADD REPLYlink written 8 months ago by Kevin Blighe25k
3
gravatar for cpad0112
9 months ago by
cpad01128.3k
India
cpad01128.3k wrote:
options(stringsAsFactors = F)
df= read.csv("test.txt", sep="\t")
library(reshape2)
library(ggplot2)
df_melt=melt(df,id.vars="gene")
ggplot(df_melt, aes(variable,value)) +
  stat_boxplot(geom="errorbar", width=.5)+
  geom_boxplot(aes(fill=variable))+
  theme_bw()+
  theme(axis.title.x=element_blank(), axis.title.y=element_blank())+
  stat_summary(fun.y=median, colour="red", geom="line", aes(group = 1))+
  geom_jitter(position = position_jitter(0.2))

Rplot

Input:

> df
                gene      HSC      CMP
1  ENSG00000158292.6 1.810264 2.456869
2  ENSG00000162496.6 2.679670 6.203838
3 ENSG00000117115.10 3.450912 5.555739
4 ENSG00000159423.14 3.680928 5.063446
5  ENSG00000053372.4 5.708997 6.851090
ADD COMMENTlink modified 9 months ago • written 9 months ago by cpad01128.3k
2

Nice, but what is the point of connecting the two medians with a red line ? I don't mean to be rude here but unless I'm missing something, that line is just "polluting" the data.

ADD REPLYlink written 9 months ago by Carlo Yague4.1k
1

There were several requests in SO to connect group means. In addition, there were requests to view data as well (jitter here). search for "connecting means in ggplot" yields several SO requests. Some of them include for boxplots as well. Lines and colors can be customized, as you are aware.

ADD REPLYlink modified 9 months ago • written 9 months ago by cpad01128.3k
1

Yeah I guess this can make sense for time series analysis or things like that...

ADD REPLYlink written 9 months ago by Carlo Yague4.1k
1

You are talented cpad

ADD REPLYlink written 9 months ago by Kevin Blighe25k
2

No where near luminaries of biostars here...(including you)

ADD REPLYlink modified 9 months ago • written 9 months ago by cpad01128.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 836 users visited in the last hour