Hi everybody!
I have transcript clusters (hierarchical clustering) of differentially expressed genes generated by Trinity pipeline. But, I don't have any information about the GO of genes in each cluster or even knowing how many genes belong to every cluster.
Therefore, I wonder if anybody has idea how can I do K-means clustering based on GO of differentially expressed genes.
Really appreciate your help.
Do you have go annotation of your transcriptome assembly at all? If not, you need to run an annotation pipeline first. See Annotating sequences after de-novo Trinity assembly and RSEM analysis...there must be an easier way! or maybe Transcriptome Analysis with only a fasta file about GO annotation first.
You can't do k-means clustering of go terms, because there is no euclidean metric for go terms (DAG is not a vector space, what's the centroid of "hydrolysis" and "DNA-repair"?).
See instead: Clustering Go Terms? or Clustering Genes Based On Gene Ontology and ftp://ftp.geneontology.org/go/www/GO.tools.microarray.shtml