Question

How Can I Evaluate My Gene Clusters?

0

Entering edit mode

12.7 years ago

Zjk ▴ 40

I have prepared a gene distance matrix based on homologous data. If my assumption is right, a smaller distance is supposed to mean the two genes are functionally similar. Now I can make clusters with this distance matrix.

Now I want to test this statement. I'm thinking about using the gene ontology annotation or co-expressions in the microarray, but I'm still not sure what data to use. Another question is that what kind of test or method should I do?

Any suggestion will be appreciated!

r gene function clustering • 2.7k views

ADD COMMENT • link updated 12.7 years ago by Paul_Muller ▴ 70 • written 12.7 years ago by Zjk ▴ 40

0

Entering edit mode

Gene distance matrix based on sequence similarity? I think sequence similarity can only get you so far, whether they share similar GO annotation or co-expressed will not prove they have similar functions. These are all in-direct evidences.

ADD REPLY • link 12.7 years ago by Vitis ★ 2.5k

0

Entering edit mode

Yes, it is based on sequence similarity. I want to test whether this distance can tell something about the gene function.

ADD REPLY • link 12.7 years ago by Zjk ▴ 40

score 1 · Answer 1 · 2011-08-01

Proteins tend to have similar functions when they have similar structures and the same structure can be coded lots of different ways by DNA. So you might not get very far with this.

One of the ways that people actually evaluate different semantic similarity measures is to compare then to the sequence similarity between them, so Lord et al (SEMANTIC SIMILARITY MEASURES AS TOOLS FOR EXPLORING THE GENE ONTOLOGY - PSB 2003) tested a variety of similarity measures using linear correlation and Wang et al ( Gene expression correlation and gene ontology-based similarity: an assessment of quantitative relationships ) did the same with gene expression data. However when sequence similarity was used GO:MF was better and using gene-expression GO:BP was better.

So - you should remember that these are all proxies for functional similarity - a smaller distance between sequences SHOULD (but definitely not all the time) equate to a higher functional similarity. Test wise - I would probably do something like permutation testing over the clusters that you have. Is the mean observed functional similarity in a cluster different to what you'd expect by chance. Just randomly alter the genes present in the clusters - recompute the functional similarity - generate a null distribution and then compare your observed value to your expected value.

Anyway, in my head it seems like a good way to go!

Ram · Answer 2 · 2011-08-15

0

Entering edit mode

12.7 years ago

Paul_Muller ▴ 70

Hello,

A paper recently published in PLOS Comp. Bio. did something very similar (in content) to what you are proposing. See Nerht et al., "Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals".

They propose orthologs to be a poor predictor of function.

Perhaps this will aid in shaping your project.

Best, Paul

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 12.7 years ago by Paul_Muller ▴ 70