Question

Clustering Go Terms?

7

Entering edit mode

13.2 years ago

Darren J. Fitzpatrick ★ 1.1k

Given a set of genes - does anybody have a simple suggestion for clustering such a set on the basis of GO terms (generally just interested in biological processes)?

I have a very stringently filtered data set and need a preliminary view of the types of biological processes represented in my reduced data set.

Thanks, D.

gene clustering • 16k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 13.2 years ago by Darren J. Fitzpatrick ★ 1.1k

Ram · Answer 1 · 2011-03-03

7

Entering edit mode

13.2 years ago

Christian Pérez-Llamas ▴ 90

It looks like what you want to do is an enrichment analysis for GO terms. In our lab we have developed a tool that allows to do the enrichment analysis in few easy steps. It is called Gitools and you can find it at http://www.gitools.org.

First you would need to download the GO terms genesets (which you can do within the tool) and then run the enrichment with your set of genes and the previously downloaded genesets (or modules in Gitools nomenclature).

You can take a look to the tutorials available in the web to get started, furthermore don't hesitate to contact the authors for any doubt.

ADD COMMENT • link 13.2 years ago by Christian Pérez-Llamas ▴ 90

1

Entering edit mode

Tutorials are now found here.

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.4 years ago by Michi ▴ 990

score 6 · Answer 2 · 2011-03-03

6

Entering edit mode

13.2 years ago

Marina Manrique ★ 1.3k

Have you thought about performing GOSlim analyses? Here you can find what GOSlim stands for "GO slims are cut-down versions of the GO ontologies containing a subset of the terms in the whole GO. They give a broad overview of the ontology content without the detail of the specific fine grained terms."

There are some GOSlims sets already defined (see link above) but you can always define your own set of GO terms to perform the GOSlim analysis.

In this kind of analysis you start with a set of GO terms and a set of selected terms that we'll call GOSlim set (for example). You then see (browsing the GO Graph) if each of the GO Terms is connected with any term of the GOSlim set. In other words, you translate all the GO terms you have initially into a set of selected (normally of interest) GO terms.

HTH. Marina

ADD COMMENT • link 13.2 years ago by Marina Manrique ★ 1.3k

0

Entering edit mode

I forgot to point out that first of all you need to know the GO Annotation for your set of genes

ADD REPLY • link 13.2 years ago by Marina Manrique ★ 1.3k

0

Entering edit mode

I'd like to show you this app to perform this GOSlim analysis in a user-friendly way http://blog.bio4j.com/?p=9 It's open source and freely available

ADD REPLY • link 13.1 years ago by Marina Manrique ★ 1.3k

score 6 · Answer 3 · 2011-03-03

6

Entering edit mode

13.2 years ago

Treylathe ▴ 950

DAVID http://david.abcc.ncifcrf.gov/ can do that, take a list of genes and cluster based on functional GO annotations. There are a lot of other tools there, and you can get quite fine tuned, but that might serve your purposes.

ADD COMMENT • link 13.2 years ago by Treylathe ▴ 950

score 4 · Answer 4 · 2011-03-04

If you actually want to cluster genes based on GO terms you need to calculate the semantic similarity between all pairs and then cluster them. I know you can do this with GOSim http://goo.gl/YvqlL (an R package), and with a little help from one of R's clustering algorithms. Also, the R package GOSemSim http://goo.gl/DXwBS might be useful though I have not used it. You also need to decide what semantic similarity metric to use http://goo.gl/fMQYS (though not all are implemented in those packages). To interpret the results of the clustering, or to just do enrichment analysis, I recommend using the Ontologizer http://goo.gl/6ejVG. It is flexible and allows you to specify the ontology, the population set, the study set, and the annotations themselves. As for the enrichment method I like MGSA http://goo.gl/T1NWl which is also implemented in the Ontologizer.

score 2 · Answer 5 · 2011-03-03

Usually this is done the other way around, you cluster or sub-select genes by some condition then you look for GO enrichment within the groups. You could first try that on your group, pehaps use MEV to do it.

The main problem (and this may already be solved in some publications that I am not aware of) with clustering directly by GO terms is defining a similarity metric that would properly characterize any two GO terms. Intuitively that just does not seem possible over more distant GO terms.

score 2 · Answer 6 · 2011-03-03

2

Entering edit mode

13.2 years ago

Chris Evelo 10k

If what you want to do is indeed enrichment analysis for GO terms you might want to check [?]this question:[?] The GO_Elite approach that I mentioned there is more or less the opposite of the GOSlim approach as it finds the most distant leaves on the GO tree first. The other answers should be of interest as well.

ADD COMMENT • link 13.2 years ago by Chris Evelo 10k

0

Entering edit mode

Couldn't edit my own (old) post. Wanted to add that a GO-Elite paper has now been published. It is at: http://dx.doi.org/10.1093/bioinformatics/bts366

ADD REPLY • link 11.8 years ago by Chris Evelo 10k

Ram · Answer 7 · 2011-03-08

1

Entering edit mode

13.1 years ago

Carl ▴ 80

Hi,

You might want to give a try to SimCT (http://tagc.univ-mrs.fr/SimCT/) which does exactly this: build a tree based on similarities of GO annotations for a set a genes.

c

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 13.1 years ago by Carl ▴ 80