Question: Compute Affinity Matrix From Distance Matrix
2
gravatar for ericmajinglong
6.2 years ago by
United States
ericmajinglong100 wrote:

Hi guys,

I used clustal omega to get a distance matrix of 500 protein sequences (they are homologous to each other).

I want to use affinity propagation to cluster these sequences.

Initially, because I observed by hand that the distance matrix only had values between 0 and 1, with 0 distance = 100% identity, I reasoned that I could just take (1 - distance) to get affinity.

I ran my code, and the clusters looked reasonable, and I thought all was well... until I read that typically, affinity matrices are calculated from distance matrices by applying a "heat kernel". That's when all hell broke loose in my mind.

Did I get the concept of affinity matrix incorrect? Is there an easy way of computing the affinity matrix? scikit-learn offers the following formula:

similarity = np.exp(-beta * distance / distance.std())

But what is beta, and what is distance.std()?

I'm quite confused and lost right now with the concepts involved (as opposed to the actual coding implementation), so any help is greatly appreciated!

ADD COMMENTlink modified 4.5 years ago by Jean-Karim Heriche19k • written 6.2 years ago by ericmajinglong100
0
gravatar for learnBioinformatics
4.5 years ago by
United States
learnBioinformatics40 wrote:

In R, it may be calculated:

For example, you have a matrix, saying A, which 200 * 300. [200 sequences, and each equence is presented by 300 features]

library(fields)   # fast way to calculate the distance

dist <- (rdist(A))^2    # dist is 200 * 200 distance matrix

t <- mean(dist) 

simMat <- exp(-dist / (2 * t^2))

Maybe, here simMat is what you want.

Hope this help.

 

 

 

ADD COMMENTlink written 4.5 years ago by learnBioinformatics40
0
gravatar for Jean-Karim Heriche
4.5 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche19k wrote:

An affinity matrix is simply a similarity matrix used as input to the affinity propagation algorithm. From http://www.psi.toronto.edu/index.php?q=affinity%20propagation:
Affinity propagation ... takes as input measures of similarity between pairs of data points ...

In the context of clustering, a similarity measure is just the converse of a distance i.e. a distance of 0 means highest similarity. If your distance metric d is between 0 and 1 then s = 1 - d is a  valid similarity measure. You can also convert a distance into a similarity using a radial basis function (a.k.a Gaussian/heat kernel).

ADD COMMENTlink written 4.5 years ago by Jean-Karim Heriche19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1279 users visited in the last hour