hello,

We would like to use either hierarchical or k means clustering, to cluster the genes in our dataset based on their function. We got the GO id for each gene and now we would like to cluster them in groups based on the function preferably hierarchical. That means from the bottom (where each function is unique) to upper levels (where we have more generalized/groups of functions).

Thanks in advance for your help!

You might want to check the R/BioC packages: GOsemsim, csbl.go They use semantic similarity measures to do GO-term based clustering.

I think you need to read about and understand the theory behind clustering approaches such as k-means. The "means" part indicates that you need some numbers (quantitative measurements).

I agree, they could cluster all of the genes linking to a given GO term (some homology measure), but that is within a GO term.