Question: Collapse Clades In A Newick Tree With Average Distance To Tips < X
4
gravatar for a1ultima
3.7 years ago by
a1ultima610
London
a1ultima610 wrote:

I have a hierarchical tree in Newick format, such as:

(A:0.556705,(B:0.251059,C:0.251059):0.305646):0.556705;

enter image description here

I need to collapse clades (sub-trees) whose average distances to the tips (terminal nodes/tips/leaves) are less than a given value, x. Such that input and collapsed output are in Newick format. For example, if B and C in the above collapsed into BC, we get something like:

(A:0.556705,BC:0.556705):0.556705;

enter image description here

Is there a simple way to programmatically achieve this collapsing given any Newick tree (e.g. in Python)?

Post-Answer Edit:

Thanks to @jhc and @Pierre Lindenbaum's answers - for anybody who needs to know whether their solutions actually work or not, they converge to the same outcome given some very complex trees! Bravo

 

 

distance tree phylogeny • 2.9k views
ADD COMMENTlink modified 3.6 years ago • written 3.7 years ago by a1ultima610
3
gravatar for jhc
3.7 years ago by
jhc2.5k
Germany
jhc2.5k wrote:

I had a similar script using ETE. I just wrote a quick example to fit your question:

from ete2 import Tree
def mean(array):
    return sum(array)/float(len(array))

def cache_distances(tree):
    ''' precalculate distances of all nodes to the root''' 
    node2rootdist = {tree:0}
    for node in tree.iter_descendants('preorder'):
        node2rootdist[node] = node.dist + node2rootdist[node.up]
    return node2rootdist

def collapse(tree, min_dist):
    # cache the tip content of each node to reduce the number of times the tree is traversed
    node2tips = tree.get_cached_content()
    root_distance = cache_distances(tree)

    for node in tree.get_descendants('preorder'):
        if not node.is_leaf():
            avg_distance_to_tips = mean([root_distance[tip]-root_distance[node]
                                         for tip in node2tips[node]])

            if avg_distance_to_tips < min_dist:
                # do whatever, ete support node annotation, deletion, labeling, etc.

                # rename
                node.name += ' COLLAPSED avg_d:%g {%s}' %(avg_distance_to_tips,
                                                 ','.join([tip.name for tip in node2tips[node]]))
                # label
                node.add_features(collapsed=True)

                # set drawing attribute so they look collapsed when displayed with tree.show()
                node.img_style['draw_descendants'] = False

                # etc...

# Example                
t = Tree("((A,(B:0.1,C:0.1)i1:0.1)i2:0.5,((D:0.1,(E:0.1,F:0.1)i3:0.1)i4:0.5,G)i5:0.3);", format=1)
print t.get_ascii(attributes=["dist", 'name'])
#                  /-1.0, A
#           /0.5, i2
#          |      |       /-0.1, B
#          |       \0.1, i1
#          |              \-0.1, C
#-1.0, NoName
#          |              /-0.1, D
#          |       /0.5, i4
#          |      |      |       /-0.1, E
#           \0.3, i5      \0.1, i3
#                 |              \-0.1, F
#                 |
#                  \-1.0, G


# Now we collapse
collapse(t, 0.5)
print t.get_ascii(attributes=['name'])
#                                       /-A
#      /i2 COLLAPSED avg_d:0.466667 {C,A,B}
#     |                                  |                            /-B
#     |                                   \i1 COLLAPSED avg_d:0.1 {C,B}
#     |                                                               \-C
#-NoName
#     |                                      /-D
#     |   /i4 COLLAPSED avg_d:0.166667 {F,D,E}
#     |  |                                  |                            /-E
#      \i5                                   \i3 COLLAPSED avg_d:0.1 {F,E}
#        |                                                               \-F
#        |
#         \-G


# tree visualization will hide collapsed nodes
t.show()


# collapsed nodes are labeled, so you locate them and prune them
for n in t.search_nodes(collapsed=True):
    for ch in n.get_children():
        ch.detach()
print t
#   /-i2 COLLAPSED avg_d:0.466667 {C,A,B}
#--|
#  |   /-i4 COLLAPSED avg_d:0.166667 {F,D,E}
#   \-|
#      \-G


print t.write()
# and the the pruned newick
#(i2 COLLAPSED avg_d_0.466667 {C_A_B}:0.5,(i4 COLLAPSED avg_d_0.166667 {F_D_E}:0.5,G:1)1:0.3);
ADD COMMENTlink written 3.7 years ago by jhc2.5k

Aha, brilliant cheers!

ADD REPLYlink written 3.7 years ago by a1ultima610

Just wanted to say this has really helped, you deserve a medal!

ADD REPLYlink written 3.6 years ago by a1ultima610

Yep, just what I was looking for.

ADD REPLYlink written 3.6 years ago by CuteDriving6630
4
gravatar for Dan Gaston
3.7 years ago by
Dan Gaston6.9k
Canada
Dan Gaston6.9k wrote:

You can probably use the ETE python package, it is designed to manipulate, analyze, and visualize phylogenetic trees. One of the ETE tutorials is how to do dynamic node collapsing given any python function while traversing the tree and exporting the new newick format with collapsed nodes:

ETE2 Tutorial: Node Collapse While Traversing

ADD COMMENTlink written 3.7 years ago by Dan Gaston6.9k
1

cheers for the recommendation +1

ADD REPLYlink written 3.7 years ago by a1ultima610
1

good point! I forgot about my own examples in the tutorial. the section on 'stopping criteria' is also interesting for this type of problems: http://pythonhosted.org/ete2/tutorial/tutorial_trees.html#advanced-traversing-stopping-criteria

ADD REPLYlink written 3.7 years ago by jhc2.5k
1
gravatar for Pierre Lindenbaum
3.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum102k wrote:

I wrote a javascript parser for newick using jison : http://zaach.github.io/jison/try/ and put the converter in a html page.

enter image description here

It seems to work with your example:

<script src="&lt;a href=" lindenb="" 10281068"="">lindenb/10281068"></script>

ADD COMMENTlink written 3.7 years ago by Pierre Lindenbaum102k

brilliant! cheers +1

ADD REPLYlink written 3.7 years ago by a1ultima610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1431 users visited in the last hour