Say that I have found the following phylogenetic tree for four species – a, b, c, and d, and this tree has a high likelihood:
      /\
   t₁/  \ t₂
    /    \
   a     /\
      t₃/  \ t₄      [tree 1]
       /    \
      /\     d
   t₅/  \ t₆
    /    \
   b      c
If I want to “extract information” about the tree of only the two species a and b, from the above tree – to the degree that this is possible,
    /\
 t₁/  \ ?           [tree 2]
  /    \
 a      b
what should be my guess for the branch length marked with a question mark in the second tree?
I am going to use this “subtree” as the starting point for further heuristic search, so I want a good guess to reduce the search time.
After “pruning” c and d from tree 1, there are several options for the branch length between root and b in tree 2:
- it could be set to t₂+t₃+t₅
 - b could be moved up to the t₃/t₄ branch, making it t₂
 - it could be set to the average value of t₂, t₃, and t₅.
 - it could be set to the branch length connecting b to its parenemphasized textt in the first tree, i.e. t₅.
 
Does any of these options make more sense than others? (Is there an obvious answer?) Is there any theory on this I could look up?
[My initial thought was that t₂+t₃+t₅ is the best estimate since this conserves the time between the root and b which – assuming tree 1 is a good one – makes the states observed in b most likely.]
Thanks! That's really interesting, I will definitely look into the node density effect. :-)