Question: How To Evaluate The A Newly Developed Clustering Algorithm?
gravatar for ftp
7.3 years ago by
United States
ftp140 wrote:


I developed a new clustering algorithm specific for gene expression for gene function prediction... I'm interested in assessing the validity of my method through some biological datasets. Is there a dataset where it lists genes which have similar functions? I want to check if genes which have similar functions are clustered together....


gene expression clustering • 1.3k views
ADD COMMENTlink modified 7.3 years ago by Istvan Albert ♦♦ 85k • written 7.3 years ago by ftp140

If you are clustering on gene expression levels alone, I fail to see why two genes with similar function would cluster. Just because two genes are kinases, they most likely won't have the same expression levels. At least I've never seen evidence for such.

Also, how do you define "similar function"? Do two genes have similar function if they're both kinases? If they're part of the same pathway? If they're both transmembrane? etcetcetc. If you define similar function by pathway, there's plenty of data sets.

ADD REPLYlink written 7.3 years ago by David Westergaard1.4k

Can you direct me to one of the pathway datasets? and yes i define similar function by pathway

ADD REPLYlink modified 7.3 years ago • written 7.3 years ago by ftp140

MSigDB is probably the most easy to parse. There are pathways from both KEGG and REACTOME.

ADD REPLYlink written 7.3 years ago by David Westergaard1.4k
gravatar for Istvan Albert
7.3 years ago by
Istvan Albert ♦♦ 85k
University Park, USA
Istvan Albert ♦♦ 85k wrote:

The way to evaluate the algorithm is to generate random data, cluster it then verify that the data that you have in the clusters do indeed belong there.

Trying to evaluate the quality of clustering from a second, possibly unrelated attribute is not the correct approach.

ADD COMMENTlink written 7.3 years ago by Istvan Albert ♦♦ 85k

Ok I understand that and I will definitely do that....However, its also important to check the validity of the method when applied to gene expression as the goal is to have a set of clusters where genes belonging to the same cluster share similar function....Is there a dataset that gives me genes that have known functions?

ADD REPLYlink written 7.3 years ago by ftp140
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2217 users visited in the last hour