Question

Question regarding the neighbor-joining algorithm

0

Entering edit mode

4.5 years ago

psdanielxu • 0

I'm learning the neighbor-joining algorithm from both the wikipedia page and Saitou and Nei's original 1987 paper. I'm having a hard time understanding if the algorithms aligned in the two pieces are consistent with each other and whether they are working like I believe they are.

For deciding which nodes are neighbors to join, Wikipedia says to minimize this function:

Q(i,j) = n(n-2)d(i,j) - \sum_{k=1}^{n} d(i,k) - \sum_{k=1}^{n} d(j,k)

where d(i,j) is the distance between taxa i and j

My understanding is that we need to minimize d(i,j) and maximize the average of the d(i, all the other nodes that are not i or j) and d(j, all the other nodes that are not i or j). This could be represented as an equation like the one Wikipedia lays out. Is this consistent with the Nei, Saitou paper?

I'm also confused about whether the (n-2) normalization term is correct. My intuition from above regarding the averages says that the term should be (n) instead. That way the equation would be rewritten as

Q(i,j) = (n)d(i,j) - (n-2)Ai - d(i,j) - (n-2)Aj - d(i,j)

Q(i,j) = (n-2)d(i,j) - (n-2)Ai - (n-2)Aj

where Ai is the average of the distances between node i and the other nodes besides i and j and a similar thing for Aj. The extra d(i,j)s come from the fact that you are summing from k=1 to n in the original equation and the (n-2) term comes from the fact that we are discarding d(i,i) and d(i,j) from the first sum and d(j,j) and d(j,i) from the second sum.

What are your thoughts? Is the Wikipedia algorithm really correct?

phylogenies neighbor-joining • 806 views

ADD COMMENT • link updated 4.3 years ago by Biostar 20 • written 4.5 years ago by psdanielxu • 0