Question: Clustering using codon usage similarity
gravatar for Saad Khan
4.3 years ago by
Saad Khan400
United States
Saad Khan400 wrote:


I have a codon usage similarity matrix that I got from somewhere. Most clustering algorithms start with data which characteristically looks like the iris dataset n rows (as observations) and x columns as features. Most R packages don't start with a distance matrix directly and apply their own distance function on the data like "euclidean", "Minkowski" etc. But Since I am directly starting with a distance matrix I was wondering if someone could provide me insight as to how in the first place cluster the matrix and then get the optimal number of clusters from data. Almost all the methods (R-packages) described here ( do not take/accept distance matrix as input. R packages like dbscan ( do accept input but you have a problem of defining "eps: Reachability maximum distance" and "MinPts: Reachability minimum number of points" beforehand. So I was wondering if anyone who has gone through similar issues can provide me examples and/or workaround to my problem.

clustering codon usage • 1.4k views
ADD COMMENTlink modified 4.1 years ago by Biostar ♦♦ 20 • written 4.3 years ago by Saad Khan400

Hi, I am not sure if I understand it correctly... But if you have your distances already (the similarity matrix) and want to cluster immediately with these (instead of calculating euclidean distances), I think you can use as.dist.


HC <- hclust(as.dist(matrix))
ADD REPLYlink written 4.3 years ago by Benn8.0k

Note that as.dist only coerces the matrix into a dist object. The content doesn't 'magically' become interpretable as a distance. If you have a matrix of similarities, you first need to convert it to distances (i.e. dissimilarities). Using a similarity matrix when a distance matrix is expected will usually produce the wrong result because a high distance value means a low similarity and vice versa. There are various ways of converting a similarity into a distance, one is simply D(i,j)=max(S)-S(i,j).

ADD REPLYlink written 4.3 years ago by Jean-Karim Heriche24k

How about other more robust methods (r-packages) like K-means, PAM(K-medoids) and mclust etc

ADD REPLYlink written 4.3 years ago by Saad Khan400

I confirm the first comment but we shoold respect the distance object forma

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by Macherki M E120
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 945 users visited in the last hour