Are There Multiple Ways To Write The Same Unrooted Tree Using Newick Format?
2
1
Entering edit mode
10.5 years ago
jli99 ▴ 150

Hi,

I have the same unrooted tree manipulated by different software resulting in different Newick files. I'm relatively sure that after the manipulation the tree remains unrooted and therefore should still be the same topology despite different branch lengths. But the problem is that can the same unrooted tree have different Newick forms? And if yes how can I rewrite the Newick files to convert them to look the same (except branch lengths)?

Thanks.

phylogenetics tree • 6.4k views
3
Entering edit mode
10.5 years ago
Farhat ★ 2.9k

When you represent an unrooted tree in Newick format, an arbitrary node is chosen as the root. Thus, you can have different Newick forms for the same tree depending on where this root is placed. Also, you can move the order of nodes around without changing the topology, e.g. ((A,B),(C,D)) and ((C,D),(A,B)) represent the same tree.

0
Entering edit mode

In this case (A,(C,D),B) is also the same? I had some confusions about adding support values to the arbitrary root. But now I think the root just doesn't have support value?

2
Entering edit mode

For bootstrapping, the value is attached to a branch, not to a node. For four leaves, there is only one bootstrapping value, on the branch between the (A,B) clade and the (C,D) clade. Or in the (A,(C,D),B) way, the only value is on the branch connecting the root and the parent of C and D. Generally, each binary unrooted tree with n leaves always has n-3 bootstrapping values.

0
Entering edit mode

No, that would have three descendants from the root node and would not be a binary tree any more. Though, (A,((C,D),B)) would be the same.

1
Entering edit mode

If the string represents an unrooted tree, (A,(C,D),B) is the exactly same as ((A,B),(C,D)). Actually some software intentionally put a trifurcation at the root to emphasize that this is an unrooted tree.

2
Entering edit mode
10.2 years ago

A description of the Newick tree format is given here, on Joe Felsenstein/Mary Kuhner's lab webpages

http://evolution.genetics.washington.edu/phylip/newicktree.html

As Farhat says above, Newick represents a rooted tree. Convention is to represent a tree that is binary and unrooted with a polytomy/multifurcation at the root node. However, note that this is only convention. The tree (A,B,(C,D)) may represent a binary, unrooted, four-taxon tree. But it might also represent a non-binary rooted tree with a polytomy at the root, the root node linked to two terminal branches (one leading to the OTU A, the other to OTU B), and an internal branch leading to the internal node that is linked to the external branches leading to OUTs C and D (OTU: Operational Taxonomic Unit http://en.wikipedia.org/wiki/Operational_taxonomic_unit )

Note the cute "paradox" that most methods of tree estimation used these days estimate unrooted trees, but for many applications of trees, we want/need to make some inference about where the root is (or isn't). Hence we need to find some way/assumptions about the position of the root (ideally that we can defend, and that hopefully we state when presenting our rooted trees) to be able to use the trees we estimate for applications requiring rooted trees.