How to manage GO terms for metagenomics
19 months ago


I have been trying to retrieve GO terms for metagenomics for a while. I already retrieved all GO IDs, however I can't collect in the main categories. Do you have any idea regarding this?

I followed the steps below.

1- Running interproscan, and extract the corresponding GO IDs 2- Mapping the GO IDs to go.obo (

Now I want to narrow them into their main corresponding categories. Basically, I want to get kind of this figure below.

Is there any R packages or tool for this purpose? There are some R packages (gprofiler, go express, top go), however the species parameter is required. In my case, as It is metagenomic data, I do not need to use it.

Thanks ! example of GO IDs

Histogram 2

Did you found any solution on your issue? I'm having the same problem :)

3 months ago

What would help you here is GO Subsets (aka Go Slims)- cut-down versions of the GO containing a subset of the terms. The generic slims work in most cases, but if you are particularly interested in a specific branch, you can create a custom Subset. Since you have the GO IDs, you simply need to trace your terms up in the Slim. You should be able to use oaklib to expand a slim:

runoak -i ~/tmp/go.obo expand-subsets -p i,p goslim_generic

Find more about the Ontology Access Kit (OAK).

If you are instead starting with gene products to an annotated organism and/or have annotations in a GAF, you can use the GO Term Mapper and upload that GAF in the Advanced Options, or use Map2Slim in OWLTools.

Also, importantly, you should not use percentages or a pie chart when reporting GO Slim counts.


