Question

Inferring Information About The Phylogenetic Tree For A Set Of Species From A Tree Of A Superset Of These

4

Entering edit mode

14.6 years ago

Ehamberg ▴ 130

Say that I have found the following phylogenetic tree for four species – a, b, c, and d, and this tree has a high likelihood:

      /\
   t₁/  \ t₂
    /    \
   a     /\
      t₃/  \ t₄      [tree 1]
       /    \
      /\     d
   t₅/  \ t₆
    /    \
   b      c

If I want to “extract information” about the tree of only the two species a and b, from the above tree – to the degree that this is possible,

    /\
 t₁/  \ ?           [tree 2]
  /    \
 a      b

what should be my guess for the branch length marked with a question mark in the second tree?

I am going to use this “subtree” as the starting point for further heuristic search, so I want a good guess to reduce the search time.

After “pruning” c and d from tree 1, there are several options for the branch length between root and b in tree 2:

it could be set to t₂+t₃+t₅
b could be moved up to the t₃/t₄ branch, making it t₂
it could be set to the average value of t₂, t₃, and t₅.
it could be set to the branch length connecting b to its parenemphasized textt in the first tree, i.e. t₅.

Does any of these options make more sense than others? (Is there an obvious answer?) Is there any theory on this I could look up?

[My initial thought was that t₂+t₃+t₅ is the best estimate since this conserves the time between the root and b which – assuming tree 1 is a good one – makes the states observed in b most likely.]

phylogenetics • 2.9k views

ADD COMMENT • link updated 14.6 years ago by Rvosa ▴ 580 • written 14.6 years ago by Ehamberg ▴ 130

Ram · Answer 1 · 2011-03-30

3

Entering edit mode

14.6 years ago

Rvosa ▴ 580

I agree that the sum of branch lengths (t2+t3+t5) is quite sensible, from a biological p.o.v.. Having said that, if you use the pruned subtree as an input for an additional iteration, the branch length that subsequently will be recovered will be smaller because of the lessened "node density effect" (e.g. see http://scholar.google.co.uk/scholar?q=node+density+effect, especially the work done by Pagel, Meade and Venditti in various publications). If all you are after is reduced search time you might as well pick a shorter branch (t5?) because it will quite likely be nearer the optimum that will subsequently be recovered.

ADD COMMENT • link updated 6.1 years ago by Ram 45k • written 14.6 years ago by Rvosa ▴ 580

0

Entering edit mode

Thanks! That's really interesting, I will definitely look into the node density effect. :-)

ADD REPLY • link 14.6 years ago by Ehamberg ▴ 130

score 2 · Answer 2 · 2011-03-30

2

Entering edit mode

14.6 years ago

Lars Juhl Jensen 11k

Without being an expert on phylogenetics by any means, I would say that your initial thought is correct. Assuming that tree 1 correctly reflects the evolution of the four species, the total branch length leading from the last common ancestor of a and b to b is t[?]2[?]+t[?]3[?]+t[?]5[?].

ADD COMMENT • link 14.6 years ago by Lars Juhl Jensen 11k