Question

Are There Multiple Ways To Write The Same Unrooted Tree Using Newick Format?

1

Entering edit mode

11.6 years ago

jli99 ▴ 150

Hi,

I have the same unrooted tree manipulated by different software resulting in different Newick files. I'm relatively sure that after the manipulation the tree remains unrooted and therefore should still be the same topology despite different branch lengths. But the problem is that can the same unrooted tree have different Newick forms? And if yes how can I rewrite the Newick files to convert them to look the same (except branch lengths)?

Thanks.

phylogenetics tree • 7.0k views

ADD COMMENT • link updated 11.3 years ago by aidan-budd 1.9k • written 11.6 years ago by jli99 ▴ 150

score 3 · Answer 1 · 2012-09-25

3

Entering edit mode

11.6 years ago

Farhat ★ 2.9k

When you represent an unrooted tree in Newick format, an arbitrary node is chosen as the root. Thus, you can have different Newick forms for the same tree depending on where this root is placed. Also, you can move the order of nodes around without changing the topology, e.g. ((A,B),(C,D)) and ((C,D),(A,B)) represent the same tree.

ADD COMMENT • link 11.6 years ago by Farhat ★ 2.9k

0

Entering edit mode

In this case (A,(C,D),B) is also the same? I had some confusions about adding support values to the arbitrary root. But now I think the root just doesn't have support value?

ADD REPLY • link 11.6 years ago by jli99 ▴ 150

2

Entering edit mode

For bootstrapping, the value is attached to a branch, not to a node. For four leaves, there is only one bootstrapping value, on the branch between the (A,B) clade and the (C,D) clade. Or in the (A,(C,D),B) way, the only value is on the branch connecting the root and the parent of C and D. Generally, each binary unrooted tree with n leaves always has n-3 bootstrapping values.

ADD REPLY • link 11.3 years ago by lh3 33k

0

Entering edit mode

No, that would have three descendants from the root node and would not be a binary tree any more. Though, (A,((C,D),B)) would be the same.

ADD REPLY • link 11.6 years ago by Farhat ★ 2.9k

1

Entering edit mode

If the string represents an unrooted tree, (A,(C,D),B) is the exactly same as ((A,B),(C,D)). Actually some software intentionally put a trifurcation at the root to emphasize that this is an unrooted tree.

ADD REPLY • link 11.3 years ago by lh3 33k

score 2 · Answer 2 · 2013-01-08

A description of the Newick tree format is given here, on Joe Felsenstein/Mary Kuhner's lab webpages

http://evolution.genetics.washington.edu/phylip/newicktree.html

As Farhat says above, Newick represents a rooted tree. Convention is to represent a tree that is binary and unrooted with a polytomy/multifurcation at the root node. However, note that this is only convention. The tree (A,B,(C,D)) may represent a binary, unrooted, four-taxon tree. But it might also represent a non-binary rooted tree with a polytomy at the root, the root node linked to two terminal branches (one leading to the OTU A, the other to OTU B), and an internal branch leading to the internal node that is linked to the external branches leading to OUTs C and D (OTU: Operational Taxonomic Unit http://en.wikipedia.org/wiki/Operational_taxonomic_unit )

Note the cute "paradox" that most methods of tree estimation used these days estimate unrooted trees, but for many applications of trees, we want/need to make some inference about where the root is (or isn't). Hence we need to find some way/assumptions about the position of the root (ideally that we can defend, and that hopefully we state when presenting our rooted trees) to be able to use the trees we estimate for applications requiring rooted trees.