Plotting different gene ontology categories in a barplot
Entering edit mode
3.1 years ago
aradhana • 0

Hi, I have performed gene ontology analysis on a set of differentially expressed genes using the Enrichr package in R.I wanted to construct a barplot in R which shows the count of genes on one axis with their respective go terms. I am new to R, so I am not very familiar with plotting using ggplot2. I have a dataset that has the following columns: Gene Ontology Class - Biological Process, Cellular Component, or Molecular Function, GO terms - GO terms associated with a particular GO class, Number of genes -contains the count of genes associated with the respective GO terms, Adjusted p-value,

I want to create a plot like the one shown below. (taken from a publication) example1

I will be really grateful if anybody can help me out.

HI Gene ontology R • 9.9k views
Entering edit mode

Please post example data. Without example data (data images do not help), it will be difficult to follow the question. Graph in the image is faceted by 3 comparisons (a,b and c in the image). In your image, there is no such data. Without that data, following is an example how to facet and fill colour. Use following steps for faceting and colouring and change it as per your needs:

  1. Load data into R
  2. Load ggplot2
  3. Plot GO term on Y-axis and count on X-axis using geom_bar function.
  4. Facet by comparison (missing from your data image)
  5. Fill by class (GO terms)
  6. Move the legend to the bottom of the plot

Try this example code:


df2=data.frame(class=sample(c("BP","MF","CC"), 30, replace=T), "GO Term"=paste0("term_", sample(1:30, 30, replace = F)), count=sample(10:30, 30, replace = T), comparison=sample(c("A","B","C"), 30, replace = T))

ggplot(df2, aes(count, GO.Term, fill = class)) +
    geom_bar(stat = "identity") +
    facet_wrap( ~ comparison, scales = "free", nrow = 3) +
        aes(label = count),
        color = "black",
        hjust = -0.1,
        size = 4,
        position = position_dodge(0.9)
    ) +
        legend.position = "bottom",
        panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks = element_blank(),
        axis.title.y = element_blank(),
        strip.text.x = element_text(size = 14, face = "bold"),
        strip.background = element_blank()


Entering edit mode

Hi, Thank you so much for your response. my dataset looks like this:


    Class              GO Term                                 Count     Adjusted.P.value

    <chr>              <chr>                                     <dbl>            <dbl>

    1 Cellular Component cytoskeleton (GO:0005856)                    9             0.00155
    2 Cellular Component secretory granule lumen (GO:0034774)         6          0.0133 

    3 Cellular Component ficolin-1-rich granule lumen (GO:1904813)     4          0.0133 

    4 Cellular Component tertiary granule lumen (GO:1904724)           4          0.0133 

    5 Cellular Component tertiary granule (GO:0070820)                 4           0.0253 
    6 Cellular Component ficolin-1-rich granule (GO:0101002)           4          0.0321 
    7  Molecular Function   cadherin binding (GO:0045296)              7          0.00314
    8 Molecular Function    aldehyde-lyase activity (GO:0016832)       2          0.00897                   
   9  Biological Process    glucose metabolic process (GO:0006006)     4        0.00227
 10   Biological Process glycolytic process through glucose-6-phosphate (GO:0061620)    3  0.00241
 11   Biological Process    canonical glycolysis (GO:0061621)   3   0.00241
 12  Biological Process glucose catabolic process to pyruvate (GO:0061718)  3   0.00241

I want to re-arrange and order the bars according to their ontology. For example, all the biological process bars will be arranged together, followed by molecular function and cellular component. But I don't know how to go about that.

Entering edit mode

Please post expected image. See if following works


df3 %>%
    extract(GO.Term, into = c("Term", "id"), "(.*)\\s\\((.*)\\)") %>%
    mutate(Term=stri_trans_totitle(Term)) %>%
    ggplot(aes(Count, reorder(Term,Count), fill = Class)) +
    geom_bar(stat = "identity") +
    facet_wrap( ~ Class, nrow = 3, , scales = "free") +
    theme(legend.position = "bottom",
          axis.title.y = element_blank()
    ) +
        aes(label = paste(Count, ", Adj.P= ", round(Adjusted.P.value,4))),
        color = "black",
        size = 4,
        position = position_dodge(0.9)


Entering edit mode

It is working. I can't thank you enough for helping me in solving this issue.

Entering edit mode

I have updated the image/code and check it out.

Entering edit mode

Thank you so much. It seems my input dataset is missing few details as you have correctly pointed out. I am going to rearrange my dataset and follow the steps you've mentioned. Thank you again.

Entering edit mode

Hello, Could you help me please to do a similar graph? Instead of number of genes, I want to plot just the -log10(FDR) values.

Here is my data:

Go_category Go_term FDR
Biological_process  mitotic cell cycle process  32.67778071
Biological_process  cell proliferation  29.82390874
Biological_process  DNA replication 29.58502665
Biological_process  membrane invagination   24.65757732
Biological_process  cell cycle  22.74472749
Biological_process  microtubule-based process   20.09691001
Biological_process  microtubule-based movement  13.15490196
Biological_process  developmental process   3.958607315
Biological_process  wax metabolic process   3.795880017
Biological_process  cell wall organization or biogenesis    3.408935393
Biological_process  single-multicellular organism process   3.13076828
Biological_process  biological regulation   2.744727495
Biological_process  regulation of biosynthetic process  2.638272164
Biological_process  jasmonic acid metabolic process 2.638272164
Biological_process  phenylpropanoid metabolic process   2.236572006
Biological_process  flavonoid biosynthetic process  2.200659451
Molecular _Function microtubule motor activity  12.79588002
Molecular _Function hydrolase activity  6.958607315
Molecular _Function catalytic activity  4.552841969
Molecular _Function copper ion binding  4.552841969
Molecular _Function Molecular_function regulator    4.552841969
Molecular _Function enzyme regulator activity   4.356547324
Molecular _Function S-linalool synthase activity    3.823908741
Molecular _Function oxidoreductase activity 3.167491087
Molecular _Function chalcone isomerase activity 2
Cellular_component  extracellular region    28.1739252
Cellular_component  chromosome  12.67778071
Cellular_component  anchored component of membrane  12.38721614
Cellular_component  DNA packaging complex   11.43179828
Cellular_component  external encapsulating structure    10.39794001
Cellular_component  apoplast    3.537602002
Entering edit mode

Please post it is as a new question.


Login before adding your answer.

Traffic: 2595 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6