Dotplot of selected GO terms with z-score via ggplot2
Hello everyone,

By using the online tool David (https://david.ncifcrf.gov) for GO enrichment analysis and the R package GOplot (https://wencke.github.io) I was able to associate a z-score to my gene ontology results. I run the analysis for 3 different conditions (Cond1, Cond2, Cond3) and selected the most interesting GO terms. Now I have a data frame like this:

GO  ID  term    Cond1_PADJ  Cond1_Zscore    Cond2_PADJ  Cond2_Zscore    Cond3_PADJ  Cond3_Zscore
BP  GO:0030198  extracellular matrix organization   3.89101E-05 -2.683281573    5.06412E-12 3.713069518 2.14344E-07 -5.099019514
BP  GO:0030154  cell differentiation    0.000553172 -0.239045722    0.002762293 0.784464541 4.07139E-06 -8.373983252


At this point I would like to generate a dot plot pretty similar to the one discussed here (A: Dotplot for filtered pathways result) using ggplot2, but with some adjustments: 1) The size of the dots should be relative to the -log of the PADJ (instead of the Gene Ratio) 2) The colour of the dots should be relative to the Z-score (instead of the p-value) 3) I would like to separate the lanes of my samples like this (relevel the clusterProfiler object)

I know there are some tools out there like Cluster profiler, Pathfindr, etc that can generate this graph, but I just need to plot my results.

Thank you : )

RNA-Seq ggplot2 david GO • 1.5k views
Thank you very much for the tutorial. I am trying to follow it but I am stuck at the plot step because what ggplot expects is basically this format:

Identifier Variable1 Variable2_Cond1 Variable2_Cond2 Variable3_Cond3

Which in tutorial words would be:

Sample  Time    OTU1    OTU2    OTU3


My problem is that I have 2 variables relative to each conditions. So It's like I have:

Sample  Time_OTU1   OTU1    Time_OTU2   OTU2    Time_OTU3   OTU3


Wich in my dataframe is:

GO_ID   Cond1_PADJ  Cond1_Zscore    Cond2_PADJ  Cond2_Zscore    Cond3_PADJ  Cond3_Zscore


How can I plot both variables?

I actually think the problem is in the melt step, because I have 2 variables and not 1, for this reason I guess I should do something like in this post (https://stackoverflow.com/questions/1544907/melt-to-two-variable-columns).

My long dataset should be something like this:

GO_ID   PADJ  value    Zscore  Value
GOID_1  Cond1   1   Cond1   4
GOID_1  Cond2   2   Cond2   5
GOID_1  Cond3   3   Cond2   6


But instead I get this:

GO_ID   Variable  value
GOID_1  Cond1_Pval  1
GOID_1  Cond2_Pval  2
GOID_1  Cond3_Pval  3
GOID_1  Cond1_ZScore    4
GOID_1  Cond2_ZScore    5
GOID_1  Cond3_ZScore    6


Any help?

You can try pivot_longer() to convert from "wide" to "long": https://tidyr.tidyverse.org/reference/pivot_longer.html

Then use separate() to split your columns: https://tidyr.tidyverse.org/reference/separate.html

Thank you very much for all the suggestions, in the end I was able to have what I wanted. Now I have another problem: the order of my go terms in within the groups is the other way around! For example: Real order:

GO1 Group1
GO2 Group1
GO3 Group1
GO1 Group2
GO2 Group2
GO3 Group2
GO2 Group3
GO1 Group3


by using:

pcm$Groups <- factor(pcm$Groups,levels=unique(pcm$Groups)) pcm$term <- factor(pcm$term,levels=unique(pcm$term))


I can see that the order stays the same in the data frame, but when plotting:

ggplot(pcm, aes(x = Pvalue, y = term))+
geom_point(aes(size = value, fill = value_2), alpha = 0.75, shape = 21, stroke = 0.5)+
scale_size_continuous(range = c(1,10)) +
labs( x= "", y = "", size = "-LogPvalue", fill = "Z-score")+
scale_fill_gradient2(low = "blue",mid = "white", high ="red", midpoint = 0, space = "Lab", aesthetics="fill")+
theme(legend.key=element_blank(),
axis.text.x = element_text(colour = "black", size = 14, angle = 90, vjust = 0.3, hjust = 1),
axis.text.y = element_text(colour = "black", size = 14),
legend.text = element_text(size = 10, colour ="black"),
legend.title = element_text(size = 11), panel.background = element_blank(),
panel.border = element_rect(colour = "black", fill = NA, size = 1),
legend.position = "right", panel.grid.major.y = element_line(colour = "grey95"))+
facet_grid(Groups ~ ., scales = "free", space = "free",)+
theme(strip.text.y = element_text(size = 10, angle = 90,face = "bold"))


The order whiten the groups goes in the opposite direction:

    GO3 Group1
GO2 Group1
GO1 Group1
GO3 Group2
GO2 Group2
GO1 Group2
GO2 Group3
GO1 Group3


Ideas?

Solved:

pcm$term <- fct_rev(pcm$term)


this solved the problem!

Instead of unique(pcm$term), try sort(unique(pcm$term))