Question: Clustering Go Terms?
gravatar for Darren J. Fitzpatrick
9.3 years ago by
Ireland/ United Kingdom
Darren J. Fitzpatrick1.1k wrote:

Given a set of genes - does anybody have a simple suggestion for clustering such a set on the basis of GO terms (generally just interested in biological processes)?

I have a very stringently filtered data set and need a preliminary view of the types of biological processes represented in my reduced data set.

Thanks, D.

gene clustering • 11k views
ADD COMMENTlink written 9.3 years ago by Darren J. Fitzpatrick1.1k
gravatar for Christian Pérez-Llamas
9.3 years ago by

It looks like what you want to do is an enrichment analysis for GO terms. In our lab we have developed a tool that allows to do the enrichment analysis in few easy steps. It is called Gitools and you can find it at

First you would need to download the GO terms genesets (which you can do within the tool) and then run the enrichment with your set of genes and the previously downloaded genesets (or modules in Gitools nomenclature).

You can take a look to the tutorials available in the web to get started, furthermore don't hesitate to contact the authors for any doubt.

ADD COMMENTlink written 9.3 years ago by Christian Pérez-Llamas90

Tutorials are now found here:

ADD REPLYlink written 5.5 years ago by Michi950
gravatar for Marina Manrique
9.3 years ago by
Marina Manrique1.3k
Marina Manrique1.3k wrote:

Have you thought about performing GOSlim analyses? Here you can find what GOSlim stands for "GO slims are cut-down versions of the GO ontologies containing a subset of the terms in the whole GO. They give a broad overview of the ontology content without the detail of the specific fine grained terms."

There are some GOSlims sets already defined (see link above) but you can always define your own set of GO terms to perform the GOSlim analysis.

In this kind of analysis you start with a set of GO terms and a set of selected terms that we'll call GOSlim set (for example). You then see (browsing the GO Graph) if each of the GO Terms is connected with any term of the GOSlim set. In other words, you translate all the GO terms you have initially into a set of selected (normally of interest) GO terms.

HTH. Marina

ADD COMMENTlink written 9.3 years ago by Marina Manrique1.3k

I forgot to point out that first of all you need to know the GO Annotation for your set of genes

ADD REPLYlink written 9.3 years ago by Marina Manrique1.3k

I'd like to show you this app to perform this GOSlim analysis in a user-friendly way It's open source and freely available

ADD REPLYlink written 9.2 years ago by Marina Manrique1.3k
gravatar for Treylathe
9.3 years ago by
San Francisco
Treylathe950 wrote:

DAVID can do that, take a list of genes and cluster based on functional GO annotations. There are a lot of other tools there, and you can get quite fine tuned, but that might serve your purposes.

ADD COMMENTlink written 9.3 years ago by Treylathe950
gravatar for Aswarren
9.3 years ago by
Blacksburg, VA
Aswarren60 wrote:

If you actually want to cluster genes based on GO terms you need to calculate the semantic similarity between all pairs and then cluster them. I know you can do this with GOSim (an R package), and with a little help from one of R's clustering algorithms. Also, the R package GOSemSim might be useful though I have not used it. You also need to decide what semantic similarity metric to use (though not all are implemented in those packages). To interpret the results of the clustering, or to just do enrichment analysis, I recommend using the Ontologizer It is flexible and allows you to specify the ontology, the population set, the study set, and the annotations themselves. As for the enrichment method I like MGSA which is also implemented in the Ontologizer.

ADD COMMENTlink written 9.3 years ago by Aswarren60
gravatar for Istvan Albert
9.3 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

Usually this is done the other way around, you cluster or sub-select genes by some condition then you look for GO enrichment within the groups. You could first try that on your group, pehaps use MEV to do it.

The main problem (and this may already be solved in some publications that I am not aware of) with clustering directly by GO terms is defining a similarity metric that would properly characterize any two GO terms. Intuitively that just does not seem possible over more distant GO terms.

ADD COMMENTlink written 9.3 years ago by Istvan Albert ♦♦ 84k
gravatar for Chris Evelo
9.3 years ago by
Chris Evelo10.0k
Maastricht, The Netherlands
Chris Evelo10.0k wrote:

If what you want to do is indeed enrichment analysis for GO terms you might want to check [?]this question:[?] The GO_Elite approach that I mentioned there is more or less the opposite of the GOSlim approach as it finds the most distant leaves on the GO tree first. The other answers should be of interest as well.

ADD COMMENTlink written 9.3 years ago by Chris Evelo10.0k

Couldn't edit my own (old) post. Wanted to add that a GO-Elite paper has now been published. It is at:

ADD REPLYlink written 7.9 years ago by Chris Evelo10.0k
gravatar for Carl
9.2 years ago by
DKFZ & Univ. Heidelberg, Heidelberg, Germany
Carl80 wrote:


You might want to give a try to SimCT ( which does exactly this: build a tree based on similarities of GO annotations for a set a genes.


ADD COMMENTlink modified 8 months ago by RamRS27k • written 9.2 years ago by Carl80
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1737 users visited in the last hour