Hierarchical Clustering
1
6
Entering edit mode
9.4 years ago
Diana ▴ 880

Hello everyone,

I'm using pvclust package in R to cluster (hierarchical clustering with bootstrap) my gene expression data. When I plot my data after clustering, all the branches collapse at the bottom and I can't see the clusters. Is there a way I can improve the image? I used the function scale to scale my data before clustering it but the tree that it produces is a little different from what is produced with unscaled data. With unscaled data, I get 3 outliers but with scaled data the outliers are embedded in the clusters. I'm worried that scaling is distorting my data. Please help.

Image with unscaled data:

If I scale the data:

I get this tree:

Test data:

Gene     condition1  condition2  condition3
AATF    0.004239637    0.004565341    0.004992545
AP-2    0.029882702    0.016730296    0.020585824
AXIN2    0.001743115    0.002124558    0.003573409


Thank you!

r clustering • 2.8k views
1
Entering edit mode

could you paste your code and the plot?

3
Entering edit mode
9.4 years ago
Wen.Huang ★ 1.2k

Do you really have to use Euclidean distance? when you scale your data, the scale and magnitude of Euclidean distance change. For gene expression data, "correlation" is almost certainly the right way to measure distance.

0
Entering edit mode

Thank you Huang for your answer. Is there any paper or review that you've come across that describes correlation method to be better than others for clustering gene expression data?I used the correlation method and it does give the same tree with scaled or unscaled data however, there is some difference in the clustering of genes as compared to Euclidean distances and I'm not sure which method would be best.

1
Entering edit mode

There is no better or worse between euclidean and correlation distance. It depends on what you believe is the best distance measure. But you definitely don't want to measure Euclidean distance on scaled gene expression. Perhaps Michael Eisen's 1998 PNAS paper is a good reference if you really want one. I believe he used correlation.