Loop function for ggplot2 bar chart for multiple variants
1
1
Entering edit mode
4 months ago

Hello, I have a data table like this one:

   value treatment time Protein.A Protein.B
1   mean    group1  24h      1.00     10.00
2   mean    group2  24h      2.00     11.00
3   mean    group3  24h      3.00     12.00
4   mean    group4  24h      4.00     13.00
5   mean    group5  24h      5.00     14.00
6   mean    group1  48h      6.00     15.00
7   mean    group2  48h      7.00     16.00
8   mean    group3  48h      8.00     17.00
9   mean    group4  48h      9.00     18.00
10  mean    group5  48h     10.00     19.00
11    sd    group1  24h      0.15      1.10
12    sd    group2  24h      0.36      1.87
13    sd    group3  24h      0.36      1.68
14    sd    group4  24h      0.72      2.21
15    sd    group5  24h      0.50      2.80
16    sd    group1  48h      0.60      2.40
17    sd    group2  48h      0.84      2.72
18    sd    group3  48h      1.36      2.89
19    sd    group4  48h      1.17      2.16
20    sd    group5  48h      1.90      2.85


For each protein, I want to make a bar chart with the x = treatment grouped by time, y = mean value of Protein A. Then, the error bar will be added by the sd value of protein A as well. Is it possible to do it with ggplot2? Can I make a loop to plot the bar chart for each protein? I do have 50 proteins that need to plot. Thank you,

R • 269 views
1
Entering edit mode

What have you tried? You seem to have the requirement mapped out very well, so translating it to ggplot2 aesthetics should be pretty straightforward. Start with ggplot(dataset, aes(x=treatment, y=...)), before which you way want to tidyr::pivot_wider so mean and sd become their own columns.

1
Entering edit mode
4 months ago

If you set up the table correctly you don't even need a for-loop.

Your table should look like this (the order of the columns does not matter):

  treatment time Protein mean sd
1 group1     24h    A     1         0.15
2 group2     24h    B      10       1.10


Then you can do:

ggplot(df, aes(x = treatment, y = mean)) + geom_col() + # create the bar plot, courtesy of the comments from the other users here
geom_errorbar(aes(ymin = mean-sd, ymax = mean+sd), width=.1) + # add errorbars
facet_wrap(~Protein) # will make one box per protein


You may not want to plot all 50 proteins in the same image, but once the protein identity has its own dedicated column it's also easy to loop over it.

EDIT: modified the bar-plot command and will mention here that time also needs to be mapped, either via aes(fill = time) or in facet_wrap(time~protein) or whatever is appropriate for your case.

3
Entering edit mode

Just to add how to get from your table as it is to what's been mentioned here.

df %<>% pivot_longer(names_to = "Protein", values_to = "ProtVal", cols = c(matches("Protein\\.")))

df %<>% pivot_wider(names_from = value, values_from = ProtVal)

ggplot(df, aes(x = treatment, y = mean)) + geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = mean-sd, ymax = mean+sd), width= 0.1) +
facet_wrap(time~Protein)


Also, I think stat = "identity" in geom_bar() was missing from Friederike's answer. I'd wager facet_wrap() also needs time, from what I can gather from the OP?

3
Entering edit mode

As a side note geom_col is the shortcut for geom_bar(stat="identity").

1
Entering edit mode

good catch, will update

1
Entering edit mode

Thank you so much, It worked very well. It is very interesting to know the pivot_longer and pivot_wider function. It saved my time a lot.

1
Entering edit mode

Thank you so much for your help.

I used the way that you mentioned for a single bar chart and it worked well for me. However, this data set gives me more struggles due to the large number of proteins that need to be plotted.

0
Entering edit mode

You should just chunk the data down further (like maybe four proteins per plot); if there are other ways you can group your proteins (in addition to the ways you already have), you should consider grouping the plots along those lines too. Then you chunk the data following this, and plot each set individually.

0
Entering edit mode

Yes. I made a loop:

name <- unique(df\$Protein)
for (i in 1:length(name)) {
df2 <- df %>% filter(str_detect(Protein, paste0(name[[i]])))
p <- ggplot(df4, aes(x = treatment, y = mean, fill=factor(treatment))) +
geom_bar(stat = "identity", position="dodge") +
geom_errorbar(aes(ymin = mean-sd, ymax = mean+sd), width= 0.1) +
facet_wrap(Protein~time)

ggsave(p, filename = paste0("Barchart_", name[i] ,".svg",sep=""), width=10, height=10)

}


Just a question: is it possible to combine 2 plots at 24h and 48h in one? Thank you,

0
Entering edit mode

Combine in what way?

0
Entering edit mode

I meant that combine 24h and 48h in one chart as two separate groups, each group has 5 bars represent for 5 groups of treatment. Thanks

0
Entering edit mode

Something like this:

ggplot(df, aes(x = treatment, y = mean, fill = time)) + geom_bar(stat = "identity", position = position_dodge()) +
geom_errorbar(aes(ymin = mean-sd, ymax = mean+sd), position = position_dodge()) +
facet_wrap(~Protein)

0
Entering edit mode

instead of aes(x = treatment, y = mean, fill = treatment) do aes(x = time, y = mean, fill = treatment) or aes(x = treatment, y = mean, fill = time)