Hi guys,
I used clustal omega to get a distance matrix of 500 protein sequences (they are homologous to each other).
I want to use affinity propagation to cluster these sequences.
Initially, because I observed by hand that the distance matrix only had values between 0 and 1, with 0 distance = 100% identity, I reasoned that I could just take (1 - distance) to get affinity.
I ran my code, and the clusters looked reasonable, and I thought all was well... until I read that typically, affinity matrices are calculated from distance matrices by applying a "heat kernel". That's when all hell broke loose in my mind.
Did I get the concept of affinity matrix incorrect? Is there an easy way of computing the affinity matrix? scikit-learn offers the following formula:
similarity = np.exp(-beta * distance / distance.std())
But what is beta, and what is distance.std()?
I'm quite confused and lost right now with the concepts involved (as opposed to the actual coding implementation), so any help is greatly appreciated!