I have been looking into how the neighbor joining algorithm works and I have two questions about the first step, i.e. computing the matrix Q using the following formula:

Q(i,j) = (n - 2)d(i,j) - Σd(i,k) - Σd(j,k)

Where both summations are from k = 1 to n, n is the number of taxons, d() is the distance between two elements and i and j are specific taxons.

From my understanding this formula creates a new matrix containing the distance between leaves *i* and *j* multiplied by *(n-2)*. From this value we subtract the total distance from *i* to all other nodes and the total distance from *j* to all other nodes. This gives us a sense of how far away any given pair of nodes is from all other clusters.

I think we are using the term *(n-2)* as a scaling factor to ensure that the value calculated by the two sums does not dominate the result. If this is the case, why is it n-2 and not n-1 or just n?

Finally, I have also encountered the following formula for Q:

Q(i,j) = d(i,j) - 1/(n-2)Σd(i,k) - 1/(n-2)Σd(j,k)

Are these two formulas equivalent? If so why?

Thank you.

This is not a scaling factor, it comes from doing the sum over all leaves except i and j.