Dotplot of selected GO terms with z-score via ggplot2
0
0
Entering edit mode
4.3 years ago

Hello everyone,

By using the online tool David (https://david.ncifcrf.gov) for GO enrichment analysis and the R package GOplot (https://wencke.github.io) I was able to associate a z-score to my gene ontology results. I run the analysis for 3 different conditions (Cond1, Cond2, Cond3) and selected the most interesting GO terms. Now I have a data frame like this:

GO  ID  term    Cond1_PADJ  Cond1_Zscore    Cond2_PADJ  Cond2_Zscore    Cond3_PADJ  Cond3_Zscore
BP  GO:0030198  extracellular matrix organization   3.89101E-05 -2.683281573    5.06412E-12 3.713069518 2.14344E-07 -5.099019514
BP  GO:0030154  cell differentiation    0.000553172 -0.239045722    0.002762293 0.784464541 4.07139E-06 -8.373983252

At this point I would like to generate a dot plot pretty similar to the one discussed here (A: Dotplot for filtered pathways result) using ggplot2, but with some adjustments: 1) The size of the dots should be relative to the -log of the PADJ (instead of the Gene Ratio) 2) The colour of the dots should be relative to the Z-score (instead of the p-value) 3) I would like to separate the lanes of my samples like this (relevel the clusterProfiler object)

I know there are some tools out there like Cluster profiler, Pathfindr, etc that can generate this graph, but I just need to plot my results.

Thank you : )

RNA-Seq ggplot2 david GO • 4.0k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you very much for the tutorial. I am trying to follow it but I am stuck at the plot step because what ggplot expects is basically this format:

Identifier Variable1 Variable2_Cond1 Variable2_Cond2 Variable3_Cond3

Which in tutorial words would be:

Sample  Time    OTU1    OTU2    OTU3

My problem is that I have 2 variables relative to each conditions. So It's like I have:

Sample  Time_OTU1   OTU1    Time_OTU2   OTU2    Time_OTU3   OTU3

Wich in my dataframe is:

GO_ID   Cond1_PADJ  Cond1_Zscore    Cond2_PADJ  Cond2_Zscore    Cond3_PADJ  Cond3_Zscore

How can I plot both variables?

ADD REPLY
0
Entering edit mode

I actually think the problem is in the melt step, because I have 2 variables and not 1, for this reason I guess I should do something like in this post (https://stackoverflow.com/questions/1544907/melt-to-two-variable-columns).

My long dataset should be something like this:

GO_ID   PADJ  value    Zscore  Value
GOID_1  Cond1   1   Cond1   4
GOID_1  Cond2   2   Cond2   5
GOID_1  Cond3   3   Cond2   6

But instead I get this:

GO_ID   Variable  value
GOID_1  Cond1_Pval  1
GOID_1  Cond2_Pval  2
GOID_1  Cond3_Pval  3
GOID_1  Cond1_ZScore    4
GOID_1  Cond2_ZScore    5
GOID_1  Cond3_ZScore    6

Any help?

ADD REPLY
1
Entering edit mode

You can try pivot_longer() to convert from "wide" to "long": https://tidyr.tidyverse.org/reference/pivot_longer.html

Then use separate() to split your columns: https://tidyr.tidyverse.org/reference/separate.html

ADD REPLY
0
Entering edit mode

Thank you very much for all the suggestions, in the end I was able to have what I wanted. Now I have another problem: the order of my go terms in within the groups is the other way around! For example: Real order:

GO1 Group1
GO2 Group1
GO3 Group1
GO1 Group2
GO2 Group2
GO3 Group2
GO2 Group3
GO1 Group3

by using:

pcm$Groups <- factor(pcm$Groups,levels=unique(pcm$Groups))
pcm$term <- factor(pcm$term,levels=unique(pcm$term))

I can see that the order stays the same in the data frame, but when plotting:

ggplot(pcm, aes(x = Pvalue, y = term))+ 
       geom_point(aes(size = value, fill = value_2), alpha = 0.75, shape = 21, stroke = 0.5)+
       scale_size_continuous(range = c(1,10)) + 
       labs( x= "", y = "", size = "-LogPvalue", fill = "Z-score")+
       scale_fill_gradient2(low = "blue",mid = "white", high ="red", midpoint = 0, space = "Lab", aesthetics="fill")+ 
       theme(legend.key=element_blank(), 
       axis.text.x = element_text(colour = "black", size = 14, angle = 90, vjust = 0.3, hjust = 1), 
       axis.text.y = element_text(colour = "black", size = 14), 
       legend.text = element_text(size = 10, colour ="black"), 
       legend.title = element_text(size = 11), panel.background = element_blank(), 
       panel.border = element_rect(colour = "black", fill = NA, size = 1), 
       legend.position = "right", panel.grid.major.y = element_line(colour = "grey95"))+
       facet_grid(Groups ~ ., scales = "free", space = "free",)+ 
       theme(strip.text.y = element_text(size = 10, angle = 90,face = "bold"))

The order whiten the groups goes in the opposite direction:

    GO3 Group1
    GO2 Group1
    GO1 Group1
    GO3 Group2
    GO2 Group2
    GO1 Group2
    GO2 Group3
    GO1 Group3

Ideas?

ADD REPLY
0
Entering edit mode

Solved:

pcm$term <- fct_rev(pcm$term)

this solved the problem!

ADD REPLY
0
Entering edit mode

Instead of unique(pcm$term), try sort(unique(pcm$term))

ADD REPLY

Login before adding your answer.

Traffic: 2938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6