Question: Clustering Method For Coexpressed Genes
1
gravatar for RT
8.1 years ago by
RT340
European Union
RT340 wrote:

Hi All,

Can any one suggest which is the best clustering algorithm to check the coexpression of genes. I used the K-mean clustering algorithm but I suspect it does not cluster correctly.

I have microarray data for total 10 samples from different conditions/tissues. When I cluster data from all the 10 samples then it gives different results than if I cluster data for just one sample. Means, in both cases same genes are clustered in different clusters. Some variation I can expect but results are entirely different in both the cases.

Please help I am new to this.

Thanks, Ritu

gene clustering • 2.8k views
ADD COMMENTlink modified 5.8 years ago by karl.stamm3.6k • written 8.1 years ago by RT340
1

@Ritu: how do you cluster one sample?

ADD REPLYlink written 8.1 years ago by Steve Lianoglou5.0k

You need to add a little more info to your question before we can reasonably answer. Things like: What program/package are you using to do the clustering? Which distance metric? What value of K are you using and how did you choose it? Is is 10 samples per condition/tissue or 10 samples total? How many replications per condition? Why do you believe the clustering is 'incorrect'?

Also, getting different results when running 10 samples and when running 1 sample is not very informative. Microarrays have lots of noise and when clustering based on one array you may just be clustering noise.

ADD REPLYlink written 8.1 years ago by Will4.5k

"When I cluster data from all the 10 samples then it gives different results than if I cluster data for just one sample." Totally overlooked this sentence. But that statement seems odd. I think you need to explain better what your question is.

ADD REPLYlink written 8.1 years ago by Michael Dondrup47k
6
gravatar for Michael Dondrup
8.1 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

This has been said very often:

  1. There is no general best clustering algorithm
  2. cluster analysis is an exploratory technique, thus the best algorithm for your data would be the one that helps you make a novel discovery that leads to an interesting hypothesis.
  3. Therefore, you have to try out many different supervised and unsupervised methods (k-means, hierarchical clustering( there are many different distance measures and inter-cluster distance measures in addition), fuzzy clustering, model-based, PCA, ICA, LDA, QDA,...).
  4. The outcome of k-means is non-deterministic, and depends on your initial centroid vectors. You have to run this algorithm multiple times.
  5. Keep a close eye on the biological background and question, your analysis must make sense in that respect.
ADD COMMENTlink written 8.1 years ago by Michael Dondrup47k
1
gravatar for Ares Cao
5.8 years ago by
Ares Cao20
Shanghai China
Ares Cao20 wrote:

various clustering method need to be used to see if you could see some obvious pattern, I think

ADD COMMENTlink written 5.8 years ago by Ares Cao20
0
gravatar for NetunoPoncã
5.8 years ago by
NetunoPoncã160
Brazil
NetunoPoncã160 wrote:

Hi there,

Some algorithms perform better than the others, but in general there is no clear overall winner for all datasets. I would suggest you to look on some comparative analysis papers, like:

Clustering cancer gene expression data: a comparative study

Also consider that the distance measure you apply along with the clustering algorithm may impact the quality of your results, see, for instance:

On the selection of appropriate distances for gene expression data clustering

Hope it helps,

Cheers!

ADD COMMENTlink modified 8 weeks ago by RamRS25k • written 5.8 years ago by NetunoPoncã160
0
gravatar for karl.stamm
5.8 years ago by
karl.stamm3.6k
United States
karl.stamm3.6k wrote:

It's not clear what you want to do with the data. Of course it will look different in different ways. These are very high dimensional data, so there are many ways to make them a two dimensional figure. The PCA plot is one view that captures maximum variation, but you could look at any two dimensions to make a scatter-plot to k-means over.

For ready made tools take a look through the available algorithms in Bioconductor and search for your particular microarray chip to see if there's gene annotation aligned with the probes you have.

ADD COMMENTlink modified 8 weeks ago by RamRS25k • written 5.8 years ago by karl.stamm3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1524 users visited in the last hour