Question: Compare tree topologies (Newick)
gravatar for Jautis
3.4 years ago by
United States
Jautis280 wrote:

Hi, I'm looking for an efficient way to do compare group topology between trees when I have multiple individuals per group (see example below). To do this, I would like to (i) compile a list of observed Newick topologies and (ii) determine which topology fits a given tree. I can do this manually for a small number of trees (see example), but my final dataset will consist of ~3000, many of which are expected to share common topologies.

Can you offer any suggestions on how to compile a list of observed topologies and to determine which topology fits a given tree? Thank you very much! Also, if you're familiar with a method starting with another format (e.g. hclust or phylo objects in R) or another strategy to determine which trees share a structure, that would work as well!

An example comparison:

Group 1 contains individuals 1 and 2; Group 2 contains individuals 4 and 5, Group 3 contains individuals 9, 10, and 11. 
Newick1: ((9:0.01,(10:0.44,11:0.44):0.01):0,((4:0.14,5:0.14):0,(1:0.40,2:0.40):0):0)

Collapsed tree because I don't care about differences in branch length

Newick1: ((9,(10,11)),((4,5),(1,2)))
Newick2: ((10,(9,11)),((4,5),(1,2)))
Newick3: ((1,2),((4,5),(9,(10,11)))
Newick4: ((10,11),(9,((4,5),(1,2)))

Result: A compiled list of 4 newick files because the topology for each tree differs. Then a matrix saying Newick1 goes into type1; Newick2 into type2; etc.

From this information, I can then determine that trees1/2 share a group-level topology (because only individuals within a group are switched), tree3 represents a different topology, and group3 is paraphyletic in tree4.

tree gene phylogeny • 2.3k views
ADD COMMENTlink modified 3.4 years ago by jhc2.8k • written 3.4 years ago by Jautis280
gravatar for jhc
3.4 years ago by
jhc2.8k wrote:

With ete compare you can compute distances for multiple trees at once, and it reports robinson-foulds distances and % of matching branches from ref to source trees and viceversa. It can also dump the matching/mismatching branches and compare trees of different size.

basic usage is simple:

$ ete3 compare -t tree1.nw tree2.nw tree3.nw (...) -r ref_tree.nw

ADD COMMENTlink written 3.4 years ago by jhc2.8k
gravatar for Brice Sarver
3.4 years ago by
Brice Sarver2.6k
United States
Brice Sarver2.6k wrote:

There are a variety of tree distances you can use to compare a tree vs. another (or a set). The quickest way to do this is using treedist in the phangorn R package.You can calculate bipartition (i.e., Robinson-Foulds) distances that should do what you want, but you can also calculate statistics that take branch lengths into account. Trees share the same (bifurcating) topology if their RF distance is zero.

ADD COMMENTlink written 3.4 years ago by Brice Sarver2.6k

Thanks for the response, that does seem like a good starting point. Do you know how I could avoid redundant comparisons? Having to make (n-1)! comparisons will be computationally intensive, but if I could avoid repeatedly comparing to the same topology that would make things much more efficient.

ADD REPLYlink written 3.4 years ago by Jautis280

You can easily compare trees because they're effectively lists of trees (as a multiPhylo object). These sorts of things are also not computationally intensive, at least not the way I see you doing it. So, you'll have one 'query' tree and calculate the distances among that tree and all others (i.e., 3000 comparisons). Any trees that are identical can be collapsed.

ADD REPLYlink written 3.4 years ago by Brice Sarver2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 715 users visited in the last hour