Question: Boxplot in ggplot2
0
gravatar for krushnach80
23 days ago by
krushnach80190
krushnach80190 wrote:

Why is it so difficult to make things in ggplot2 , i like the way it helps in customisation but the curve is steep nevertheless

Here is my sample dataframe

df <-           gene        HSC       CMP
       ENSG00000158292.6  1.8102636  2.456869
       ENSG00000162496.6  2.6796705  6.203838
       ENSG00000117115.10  3.4509115  5.555739
       ENSG00000159423.14  3.6809277  5.063446
       ENSG00000053372.4  5.7089974  6.851090

If i have plot a boxplot i can simply write this boxplot(df[,-1],col=c("red","blue"))

I get a boxplot but when im trying with ggplot2 im having difficult time

ex <- melt(df, id.vars=c("HSC", "CMP"))
ggplot(data = ex,
       aes(x = CMP, y = HSC)) +
  geom_boxplot()

I get a single boxplot what i want is i get a box plot for HSC and CMP which i got when i use simple base R boxplot .

Any help or suggestion would be highly appreciated with my ggplot2 code

R • 272 views
ADD COMMENTlink modified 23 days ago • written 23 days ago by krushnach80190

Thank you for such cool neat code ...

ADD REPLYlink written 23 days ago by krushnach80190
5
gravatar for Kevin Blighe
23 days ago by
Kevin Blighe8.9k
Europe/Americas
Kevin Blighe8.9k wrote:

Devon got there before me but as he mentioned the id.vars needs to be set to 'gene'

Here's a boxplot with scatterplot overlay for anyone else arriving here from Google.

I do agree that ggplot can be difficult to work with. Many functions redundant in the sense that they do the same thing as other but have different names, and conflicts frequently arise. That said, if you can master ggplot, then you can produce very nice graphics for publications.

require(reshape2)
require(ggplot2)

ex <- melt(df, id.vars=c("gene"))
colnames(ex) <- c("gene","group","exprs")

ggplot(data=ex, aes(x=group, y=exprs)) +

    geom_boxplot(position=position_dodge(width=0.5), outlier.shape=17, outlier.colour="red", outlier.size=0.1, aes(fill=group)) +

    #Choose which colours to use; otherwise, ggplot2 choose automatically
    #scale_color_manual(values=c("red3", "white", "blue")) + #for scatter plot dots
    scale_fill_manual(values=c("red", "royalblue")) + #for boxplot

    #Add the scatter points (treats outliers same as 'inliers')
    geom_jitter(position=position_jitter(width=0.3), size=3.0, colour="black") +

    #Set the size of the plotting window
    theme_bw(base_size=24) +

    #Modify various aspects of the plot text and legend
    theme(
        legend.position="none",
        legend.background=element_rect(),
        plot.title=element_text(angle=0, size=14, face="bold", vjust=1),

        axis.text.x=element_text(angle=45, size=14, face="bold", hjust=1.10),
        axis.text.y=element_text(angle=0, size=14, face="bold", vjust=0.5),
        axis.title=element_text(size=14, face="bold"),

        #Legend
        legend.key=element_blank(),     #removes the border
        legend.key.size=unit(1, "cm"),      #Sets overall area/size of the legend
        legend.text=element_text(size=12),  #Text size
        title=element_text(size=12)) +      #Title text size

    #Change the size of the icons/symbols in the legend
    guides(colour=guide_legend(override.aes=list(size=2.5))) +

    #Set x- and y-axes labels
    xlab("Stem cell class") +
    ylab("Expression") +

    #ylim(0, 0) +

    ggtitle("My plot")

boxscatter

ADD COMMENTlink written 23 days ago by Kevin Blighe8.9k
1

That's nice, but a violin plot would be better ;-)

ADD REPLYlink written 23 days ago by WouterDeCoster24k

Coincidentally, I just produced a violin plot for other data ;)

ggplot(violinMatrix, aes(x=Sample, y=Expression)) + geom_violin() + theme(axis.text.x = element_text(angle=45, hjust=1))

lol

ADD REPLYlink written 23 days ago by Kevin Blighe8.9k

Thank both of you.i been breaking my head over it ..

ADD REPLYlink written 23 days ago by krushnach80190
1

Don't worry. I did the same a few years ago trying to work with ggplot.

ADD REPLYlink written 23 days ago by Kevin Blighe8.9k

Im using your code to make boxplots for normalised vs as the data that is not normalised ,what i have to do not to fill those box with data points or dots i tried to remove "aes(fill=group)" still i dont get it i see my hoxplot but it looks filled up with dotpoints..any suggestion ?

ADD REPLYlink written 18 days ago by krushnach80190
1

Hello my friend. If you do not want the scatterplot overlayed onto the boxplot, just comment out:

geom_jitter(position=position_jitter(width=0.3), size=3.0, colour="black") +
ADD REPLYlink written 18 days ago by Kevin Blighe8.9k

thank you i kind of figured it out after playing it around but anyways thank you for your prompt response

ADD REPLYlink written 18 days ago by krushnach80190

No problem, good luck with it.

ADD REPLYlink written 18 days ago by Kevin Blighe8.9k
4
gravatar for Devon Ryan
23 days ago by
Devon Ryan73k
Freiburg, Germany
Devon Ryan73k wrote:
ex = melt(df, id.vars="gene")
ggplot(ex, aes(x=variable, y=value)) + geom_boxplot()

Your melt() command produced nonsensical output.

ADD COMMENTlink written 23 days ago by Devon Ryan73k
1

This is good but I would use gather from tidyr. The package tidyr is the evolution of reshape2, and it contains more functions to massage data and reshape it for ggplot2/tidyverse.

ADD REPLYlink written 23 days ago by Giovanni M Dall'Olio25k
1

Agreed and that's what I teach our students, but I don't want to complicate things when answering a simple "why does X not work" question :)

ADD REPLYlink written 23 days ago by Devon Ryan73k

okay let me do this ...

ADD REPLYlink written 23 days ago by krushnach80190

Thank you very much

ADD REPLYlink written 23 days ago by krushnach80190
3
gravatar for cpad0112
23 days ago by
cpad01123.5k
cpad01123.5k wrote:
options(stringsAsFactors = F)
df= read.csv("test.txt", sep="\t")
library(reshape2)
library(ggplot2)
df_melt=melt(df,id.vars="gene")
ggplot(df_melt, aes(variable,value)) +
  stat_boxplot(geom="errorbar", width=.5)+
  geom_boxplot(aes(fill=variable))+
  theme_bw()+
  theme(axis.title.x=element_blank(), axis.title.y=element_blank())+
  stat_summary(fun.y=median, colour="red", geom="line", aes(group = 1))+
  geom_jitter(position = position_jitter(0.2))

Rplot

Input:

> df
                gene      HSC      CMP
1  ENSG00000158292.6 1.810264 2.456869
2  ENSG00000162496.6 2.679670 6.203838
3 ENSG00000117115.10 3.450912 5.555739
4 ENSG00000159423.14 3.680928 5.063446
5  ENSG00000053372.4 5.708997 6.851090
ADD COMMENTlink modified 23 days ago • written 23 days ago by cpad01123.5k
2

Nice, but what is the point of connecting the two medians with a red line ? I don't mean to be rude here but unless I'm missing something, that line is just "polluting" the data.

ADD REPLYlink written 23 days ago by Carlo Yague3.5k
1

There were several requests in SO to connect group means. In addition, there were requests to view data as well (jitter here). search for "connecting means in ggplot" yields several SO requests. Some of them include for boxplots as well. Lines and colors can be customized, as you are aware.

ADD REPLYlink modified 23 days ago • written 23 days ago by cpad01123.5k
1

Yeah I guess this can make sense for time series analysis or things like that...

ADD REPLYlink written 23 days ago by Carlo Yague3.5k
1

You are talented cpad

ADD REPLYlink written 23 days ago by Kevin Blighe8.9k
2

No where near luminaries of biostars here...(including you)

ADD REPLYlink modified 23 days ago • written 23 days ago by cpad01123.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 944 users visited in the last hour