What Algorithm Is Used To Produce The Tree Splits In The Splitstree Software
1
3
Entering edit mode
8.4 years ago

Hi all,

I am trying to figure out the logic of how trees are split, for example the method used by splitstree. I have read various papers and the splitstree manual, however to no avail. Using a simple tree like the following:

tree="((t5:2.161175,t6:0.161175):0.392293,((t4:0.104381,(t2:0.075411,t1:0.075411):1):0.065840,t3:0.170221):0.383247)"


Splitstree gives the following splits:

2.161175 Split t5,
0.161175 Split t5 t4 t2 t1 t3,
0.104381 Split t5 t6 t2 t1 t3,
0.075411 Split t5 t6 t4 t1 t3,
0.075411 Split t5 t6 t4 t2 t3,
1.0 Split t5 t6 t4 t3,
0.06584 Split t5 t6 t3,
0.170221 Split t5 t6 t4 t2 t1,
0.77554 Split t5 t6,


Could anyone please explain how this is carried out, in particular the T5, T6 distance. Thanks

tree phylogenetics split biopython • 2.7k views
2
Entering edit mode
8.4 years ago
David W 4.8k

Hi Brian,

This is easier to understand if you can actually see what going on. So, using the R library ape plot your tree and label each of the edges:

library(ape)
plot(tr)
edgelabels()


A split is what happens when you remove any of those edges, creating two sets of tips . If we remove edge 1 in that tree, we end up with one tree containing {t5, t6} and another with {t1, t2, t3, ,t4}. The labels you have only show one half of the split. The distance given for each split is the sum of the edges that connect each of the subsets.

In this case it's edges 4 and 1 that connect our two subsets, so we can get the length like this

brlens <- tr\$edge.length[c(1,4)]
sum(brlens)
## 0.77554

0
Entering edit mode

Thanks for all your help. I have input the tree into R and I just want to ensure I have grasped your explanation correctly. The "aim" is to attach each tip to the rest of the tree? e.g. Attach T5 to the rest of the tree (T6, T4, T3, T2, T1) is 2.161175. and this is carried out for each tip. namely T6, T4, T3, T2, and finally T1. Resulting in the following: t5: 2.161175 t6: .161175 t4: .104381 t3: .170221 t2: .075411 t1: .075411

The distances to attach the sub trees is calculated: T1 T2 to the rest of the tree (T3, T4, T5, T6) : 1 T1,T2,T4 (T3,T5,T6): .065840

Am I correct in this thinking thus far?

Finally T5 T6: to attach this sub tree to the rest of the tree is .77554. However I am still unsure why it is the addition of R labels 1 and 4. To attach T5,T6 to T4,T3,T2,T1 why is it not just R label 4.

Thanks very much for your help, greatly appreciated.