Question: Compute Affinity Matrix From Distance Matrix
2
6.2 years ago by
United States
ericmajinglong100 wrote:

Hi guys,

I used clustal omega to get a distance matrix of 500 protein sequences (they are homologous to each other).

I want to use affinity propagation to cluster these sequences.

Initially, because I observed by hand that the distance matrix only had values between 0 and 1, with 0 distance = 100% identity, I reasoned that I could just take (1 - distance) to get affinity.

I ran my code, and the clusters looked reasonable, and I thought all was well... until I read that typically, affinity matrices are calculated from distance matrices by applying a "heat kernel". That's when all hell broke loose in my mind.

Did I get the concept of affinity matrix incorrect? Is there an easy way of computing the affinity matrix? scikit-learn offers the following formula:

``````similarity = np.exp(-beta * distance / distance.std())
``````

But what is `beta`, and what is `distance.std()`?

I'm quite confused and lost right now with the concepts involved (as opposed to the actual coding implementation), so any help is greatly appreciated!

modified 4.5 years ago by Jean-Karim Heriche19k • written 6.2 years ago by ericmajinglong100
0
4.5 years ago by
United States
learnBioinformatics40 wrote:

In R, it may be calculated:

For example, you have a matrix, saying A, which 200 * 300. [200 sequences, and each equence is presented by 300 features]

library(fields)   # fast way to calculate the distance

dist <- (rdist(A))^2    # dist is 200 * 200 distance matrix

t <- mean(dist)

simMat <- exp(-dist / (2 * t^2))

Maybe, here simMat is what you want.

Hope this help.

0
4.5 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche19k wrote:

An affinity matrix is simply a similarity matrix used as input to the affinity propagation algorithm. From http://www.psi.toronto.edu/index.php?q=affinity%20propagation:
Affinity propagation ... takes as input measures of similarity between pairs of data points ...

In the context of clustering, a similarity measure is just the converse of a distance i.e. a distance of 0 means highest similarity. If your distance metric d is between 0 and 1 then s = 1 - d is a  valid similarity measure. You can also convert a distance into a similarity using a radial basis function (a.k.a Gaussian/heat kernel).