Question: Compute Affinity Matrix From Distance Matrix
gravatar for ericmajinglong
7.6 years ago by
United States
ericmajinglong120 wrote:

Hi guys,

I used clustal omega to get a distance matrix of 500 protein sequences (they are homologous to each other).

I want to use affinity propagation to cluster these sequences.

Initially, because I observed by hand that the distance matrix only had values between 0 and 1, with 0 distance = 100% identity, I reasoned that I could just take (1 - distance) to get affinity.

I ran my code, and the clusters looked reasonable, and I thought all was well... until I read that typically, affinity matrices are calculated from distance matrices by applying a "heat kernel". That's when all hell broke loose in my mind.

Did I get the concept of affinity matrix incorrect? Is there an easy way of computing the affinity matrix? scikit-learn offers the following formula:

similarity = np.exp(-beta * distance / distance.std())

But what is beta, and what is distance.std()?

I'm quite confused and lost right now with the concepts involved (as opposed to the actual coding implementation), so any help is greatly appreciated!

ADD COMMENTlink modified 5.9 years ago by Jean-Karim Heriche23k • written 7.6 years ago by ericmajinglong120
gravatar for learnBioinformatics
5.9 years ago by
United States
learnBioinformatics40 wrote:

In R, it may be calculated:

For example, you have a matrix, saying A, which 200 * 300. [200 sequences, and each equence is presented by 300 features]

library(fields)   # fast way to calculate the distance

dist <- (rdist(A))^2    # dist is 200 * 200 distance matrix

t <- mean(dist) 

simMat <- exp(-dist / (2 * t^2))

Maybe, here simMat is what you want.

Hope this help.




ADD COMMENTlink written 5.9 years ago by learnBioinformatics40
gravatar for Jean-Karim Heriche
5.9 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

An affinity matrix is simply a similarity matrix used as input to the affinity propagation algorithm. From
Affinity propagation ... takes as input measures of similarity between pairs of data points ...

In the context of clustering, a similarity measure is just the converse of a distance i.e. a distance of 0 means highest similarity. If your distance metric d is between 0 and 1 then s = 1 - d is a  valid similarity measure. You can also convert a distance into a similarity using a radial basis function (a.k.a Gaussian/heat kernel).

ADD COMMENTlink written 5.9 years ago by Jean-Karim Heriche23k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1982 users visited in the last hour