Question: Phylogenetic Trees
1
9.1 years ago by
Patrick200
Patrick200 wrote:

Hello ,

Given a phylogenetic tree of species (fungi) I would like to test all cut within internal branches (not final nodes). Each cut should give two subtrees having at least 3 species.

How can we do that, no matter the programing langages, R perl are fine not python because I am not familiar with.

I want to know all possible cuts reporting the edge (branch) where the cut happened.

Thank you

Patrick

phylogenetics tree • 5.0k views
modified 8.9 years ago by DG7.1k • written 9.1 years ago by Patrick200

Try drawing some dendrograms first to figure out what you really want to do.

If you cut a dendrogram at a certain height, in the sense I understand it, you will get a variable number of subtrees. You get exactly 2 subtrees if you cut exactly at the root node, 3 if you cut on the level of the second node from the root, and so on. So your constraint doesn't make much sense. Or do you want to enumerate all subtrees which have at least 3 leafs?

8
9.1 years ago by
Leszek4.0k
IIMCB, Poland
Leszek4.0k wrote:

Hi,

I don't know any solution for perl/R, but here you have extremely easy solution with python and using ETE toolkit. I strongly recommend ETE for work with any kind of tree structures - there is plenty of examples in the website. And here is my proposal:

``````import ete2
nw="""(((((((((((Kla:0.4068711746,Ago:0.4935929271)0.9998500075:0.0649583305,(Skl:0.2127283384,Kwa:0.3386518162)0.9998500075:0.0596434075)0.9998500075:0.0483197008,(((((((Spa:0.0228270085,Sce:0.0291457683)0.9998500075:0.0184892412,Smi:0.0506383315)0.9998500075:0.0175699846,Sku:0.0626933799)0.9998500075:0.0208286899,Sba:0.0583097334)0.9998500075:0.2666181393,Sca:0.3359364675)0.9998500075:0.0507588121,Cgl:0.3920684197)0.9998500075:0.0540543658,Kpo:0.3889500060)0.9998500075:0.1059363031)0.9998500075:0.5647674584,Yli:1.2024129416)0.9998500075:0.1233697967,Ppa:0.7445082201)0.9998500075:0.3615987996,Clu:0.3980231942)0.9998500075:0.0602126907,(Cgu:0.3957858607,Dha:0.2633445543)0.9998500075:0.0572209566)0.9998500075:0.0737058420,Pst:0.2583969260)0.9998500075:0.1763783938,(Lel:0.2751884155,Cpp:0.2312606593)0.9998500075:0.1174550114)0.9998500075:0.1037171734,Ctr:0.1556849278)0.9998500075:0.1228683534,Cdu:0.0335965265,Cal:0.0282471295);"""

#crate tree object
t=ete2.PhyloTree(nw)

#root by midpoint
t.set_outgroup( t.get_midpoint_outgroup() )

#print tree
print t

#or it show graphically
t.show()

#get all leaf names - species names
species=t.get_leaf_names()

#iterate all nodes
subtrees=[] #will store pairs of subtrees objects with at least 3 species
i=0
for n in t.iter_descendants(): #iterate through all the nodes of the tree
i+=1
#skip if node contain not 2 descendants, so node is leaf (final node)
if len( n.get_children() )!=2: continue
#store 2 subtrees as d1 and d2
d1,d2=n.get_children()
#check if both subtrees contain at least 3 species (3 leaves)
if len( d1.get_leaf_names() )<3 or len( d2.get_leaf_names() )<3: continue #if not, skip
#add both subtrees to subtrees list, print iteration, and show node at which tree was split
subtrees.append( (d1,d2) )
print 'Iteration: %s' % i
n.show()

print "%s pairs of subtrees with at least 3 species found among %s cuts possible." % ( len(subtrees),i )
``````

This will cut the tree in each node and check whether descendants contain 3+ leaves (final nodes). You can have a look at t.prune() function, which removes nodes of the tree beside the list of nodes you give it as an argument.

Hope it helps,

BTW: ipython is cool stuff when you want to learn python. It completes objects (with tab) and you can get info about every object/function but typing it name with `?` at the end, for example:

``````t.prune?
``````

``````Type:        instancemethod
Base Class:    <type 'instancemethod'>
String Form:    <bound method PhyloNode.prune of <ete2.phylo.phylotree.PhyloNode object at 0x2592f90>>
Namespace:    Interactive
File:        /usr/local/lib/python2.6/dist-packages/ete2-2.0rev89-py2.6.egg/ete2/coretype/tree.py
Definition:    t.prune(self, nodes)
Docstring:
Prunes the topology of this node in order to conserve only a
selected list of leaf or internal nodes. The algorithm deletes
nodes until getting a consistent topology with a subset of
nodes. Topology relationships among kept nodes is maintained.

ARGUMENTS:
==========
* 'nodes' is a list of node names or node objects that must be kept.

EXAMPLES:
=========
t = Tree("(((A:0.1, B:0.01):0.001, C:0.0001):1.0[&&NHX:name=I], (D:0.00001):0.000001[&&NHX:name=J]):2.0[&&NHX:name=root];")
node_C = t.search_nodes(name="C")[0]
t.prune(["A","D", node_C])
print t
``````
0
8.9 years ago by
DG7.1k
DG7.1k wrote:

Perl and bioperl would be a good choice. The bioperl data structure will allow you to easily loop over all internal nodes and you can simply split off a subtree on all internal nodes where the descendants contain at least three leaf taxa.

Code from Bioperl's Phylogenetic and Analysis Scrapbook, finding All Clades in a tree could be easily modified.

Finding All Clades in a Phylogenetic Tree