Question: Clustering Genes Based On Function
gravatar for jellevandewege
6.2 years ago by
jellevandewege90 wrote:


We would like to use either hierarchical or k means clustering, to cluster the genes in our dataset based on their function. We got the GO id for each gene and now we would like to cluster them in groups based on the function preferably hierarchical. That means from the bottom (where each function is unique) to upper levels (where we have more generalized/groups of functions).

Thanks in advance for your help!

gene function • 2.2k views
ADD COMMENTlink modified 6.2 years ago by pld4.8k • written 6.2 years ago by jellevandewege90

You might want to check the R/BioC packages: GOsemsim, csbl.go They use semantic similarity measures to do GO-term based clustering.

ADD REPLYlink written 6.2 years ago by Diwan570

I think you need to read about and understand the theory behind clustering approaches such as k-means. The "means" part indicates that you need some numbers (quantitative measurements).

ADD REPLYlink written 6.2 years ago by Neilfws48k

I agree, they could cluster all of the genes linking to a given GO term (some homology measure), but that is within a GO term.

ADD REPLYlink written 6.2 years ago by pld4.8k
gravatar for pld
6.2 years ago by
United States
pld4.8k wrote:

A quick note, most genes will have multiple GO terms mapped to them. Also, GO already exists as a hierarchical structure, that is how it was designed. So the thing to do would be to visualize the structure of terms enriched in your data and then build your gene clustering off of that tree.

However, unless you prune down the annotations for a given gene to a single GO term, you will have a weird looking clustering since a given gene will end up in multiple clusters. I am not sure that I see an advantage of pruning down annotations, you'll end up with really biased data.

ADD COMMENTlink written 6.2 years ago by pld4.8k

thanks for the information!

ADD REPLYlink written 6.2 years ago by jellevandewege90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1360 users visited in the last hour