Question: Boxplot in ggplot2
0
gravatar for krushnach80
21 months ago by
krushnach80570
krushnach80570 wrote:

Why is it so difficult to make things in ggplot2 , i like the way it helps in customisation but the curve is steep nevertheless

Here is my sample dataframe

df <-           gene        HSC       CMP
       ENSG00000158292.6  1.8102636  2.456869
       ENSG00000162496.6  2.6796705  6.203838
       ENSG00000117115.10  3.4509115  5.555739
       ENSG00000159423.14  3.6809277  5.063446
       ENSG00000053372.4  5.7089974  6.851090

If i have plot a boxplot i can simply write this boxplot(df[,-1],col=c("red","blue"))

I get a boxplot but when im trying with ggplot2 im having difficult time

ex <- melt(df, id.vars=c("HSC", "CMP"))
ggplot(data = ex,
       aes(x = CMP, y = HSC)) +
  geom_boxplot()

I get a single boxplot what i want is i get a box plot for HSC and CMP which i got when i use simple base R boxplot .

Any help or suggestion would be highly appreciated with my ggplot2 code

R • 3.5k views
ADD COMMENTlink modified 21 months ago • written 21 months ago by krushnach80570

Thank you for such cool neat code ...

ADD REPLYlink written 21 months ago by krushnach80570
5
gravatar for Devon Ryan
21 months ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:
ex = melt(df, id.vars="gene")
ggplot(ex, aes(x=variable, y=value)) + geom_boxplot()

Your melt() command produced nonsensical output.

ADD COMMENTlink written 21 months ago by Devon Ryan91k
1

This is good but I would use gather from tidyr. The package tidyr is the evolution of reshape2, and it contains more functions to massage data and reshape it for ggplot2/tidyverse.

ADD REPLYlink written 21 months ago by Giovanni M Dall'Olio26k
1

Agreed and that's what I teach our students, but I don't want to complicate things when answering a simple "why does X not work" question :)

ADD REPLYlink written 21 months ago by Devon Ryan91k

okay let me do this ...

ADD REPLYlink written 21 months ago by krushnach80570

Thank you very much

ADD REPLYlink written 21 months ago by krushnach80570
5
gravatar for Kevin Blighe
21 months ago by
Kevin Blighe46k
Kevin Blighe46k wrote:

Devon got there before me but as he mentioned the id.vars needs to be set to 'gene'

Here's a boxplot with scatterplot overlay for anyone else arriving here from Google.

I do agree that ggplot can be difficult to work with. Many functions redundant in the sense that they do the same thing as other but have different names, and conflicts frequently arise. That said, if you can master ggplot, then you can produce very nice graphics for publications.

require(reshape2)
require(ggplot2)

ex <- melt(df, id.vars=c("gene"))
colnames(ex) <- c("gene","group","exprs")

ggplot(data=ex, aes(x=group, y=exprs)) +

    geom_boxplot(position=position_dodge(width=0.5), outlier.shape=17, outlier.colour="red", outlier.size=0.1, aes(fill=group)) +

    #Choose which colours to use; otherwise, ggplot2 choose automatically
    #scale_color_manual(values=c("red3", "white", "blue")) + #for scatter plot dots
    scale_fill_manual(values=c("red", "royalblue")) + #for boxplot

    #Add the scatter points (treats outliers same as 'inliers')
    geom_jitter(position=position_jitter(width=0.3), size=3.0, colour="black") +

    #Set the size of the plotting window
    theme_bw(base_size=24) +

    #Modify various aspects of the plot text and legend
    theme(
        legend.position="none",
        legend.background=element_rect(),
        plot.title=element_text(angle=0, size=14, face="bold", vjust=1),

        axis.text.x=element_text(angle=45, size=14, face="bold", hjust=1.10),
        axis.text.y=element_text(angle=0, size=14, face="bold", vjust=0.5),
        axis.title=element_text(size=14, face="bold"),

        #Legend
        legend.key=element_blank(),     #removes the border
        legend.key.size=unit(1, "cm"),      #Sets overall area/size of the legend
        legend.text=element_text(size=12),  #Text size
        title=element_text(size=12)) +      #Title text size

    #Change the size of the icons/symbols in the legend
    guides(colour=guide_legend(override.aes=list(size=2.5))) +

    #Set x- and y-axes labels
    xlab("Stem cell class") +
    ylab("Expression") +

    #ylim(0, 0) +

    ggtitle("My plot")

boxscatter

ADD COMMENTlink written 21 months ago by Kevin Blighe46k
1

That's nice, but a violin plot would be better ;-)

ADD REPLYlink written 21 months ago by WouterDeCoster40k

Coincidentally, I just produced a violin plot for other data ;)

ggplot(violinMatrix, aes(x=Sample, y=Expression)) + geom_violin() + theme(axis.text.x = element_text(angle=45, hjust=1))

lol

ADD REPLYlink written 21 months ago by Kevin Blighe46k

Thank both of you.i been breaking my head over it ..

ADD REPLYlink written 21 months ago by krushnach80570
1

Don't worry. I did the same a few years ago trying to work with ggplot.

ADD REPLYlink written 21 months ago by Kevin Blighe46k

Im using your code to make boxplots for normalised vs as the data that is not normalised ,what i have to do not to fill those box with data points or dots i tried to remove "aes(fill=group)" still i dont get it i see my hoxplot but it looks filled up with dotpoints..any suggestion ?

ADD REPLYlink written 21 months ago by krushnach80570
1

Hello my friend. If you do not want the scatterplot overlayed onto the boxplot, just comment out:

geom_jitter(position=position_jitter(width=0.3), size=3.0, colour="black") +
ADD REPLYlink written 21 months ago by Kevin Blighe46k

thank you i kind of figured it out after playing it around but anyways thank you for your prompt response

ADD REPLYlink written 21 months ago by krushnach80570

No problem, good luck with it.

ADD REPLYlink written 21 months ago by Kevin Blighe46k
3
gravatar for cpad0112
21 months ago by
cpad011211k
India
cpad011211k wrote:
options(stringsAsFactors = F)
df= read.csv("test.txt", sep="\t")
library(reshape2)
library(ggplot2)
df_melt=melt(df,id.vars="gene")
ggplot(df_melt, aes(variable,value)) +
  stat_boxplot(geom="errorbar", width=.5)+
  geom_boxplot(aes(fill=variable))+
  theme_bw()+
  theme(axis.title.x=element_blank(), axis.title.y=element_blank())+
  stat_summary(fun.y=median, colour="red", geom="line", aes(group = 1))+
  geom_jitter(position = position_jitter(0.2))

Rplot

Input:

> df
                gene      HSC      CMP
1  ENSG00000158292.6 1.810264 2.456869
2  ENSG00000162496.6 2.679670 6.203838
3 ENSG00000117115.10 3.450912 5.555739
4 ENSG00000159423.14 3.680928 5.063446
5  ENSG00000053372.4 5.708997 6.851090
ADD COMMENTlink modified 21 months ago • written 21 months ago by cpad011211k
2

Nice, but what is the point of connecting the two medians with a red line ? I don't mean to be rude here but unless I'm missing something, that line is just "polluting" the data.

ADD REPLYlink written 21 months ago by Carlo Yague4.6k
1

There were several requests in SO to connect group means. In addition, there were requests to view data as well (jitter here). search for "connecting means in ggplot" yields several SO requests. Some of them include for boxplots as well. Lines and colors can be customized, as you are aware.

ADD REPLYlink modified 21 months ago • written 21 months ago by cpad011211k
1

Yeah I guess this can make sense for time series analysis or things like that...

ADD REPLYlink written 21 months ago by Carlo Yague4.6k
1

You are talented cpad

ADD REPLYlink written 21 months ago by Kevin Blighe46k
2

No where near luminaries of biostars here...(including you)

ADD REPLYlink modified 21 months ago • written 21 months ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1178 users visited in the last hour