Dear all, I have performed hierarchical clustering for bisulfite DNA methylation data using two different dissimilarity methods: euclidean distance and pearson correlation by using R package pvclust. However,the tree structure based on those two methods are different. My question is which one should I use for hierarchical clustering of bimodal distributed DNA methylation data? Is there any published paper that have already compared different dissimilarity methods?
Thanks in advance.
As said often before (here and elsewhere) the attempt to recommend a single best method for data-mining is futile, given the lack of a gold standard to compare your results with. Clustering is exploratory and used for hypothesis generation, therefore the way to go is to apply many different methods (including other clustering methods: kmeans, Mclust) and try to evaluate the results in the light of your biological knowledge. Also, use e.g. GO analysis, pathway analysis and GSEA).