Question: Get distance matrix based on node tree
0
gravatar for Chvatil
12 months ago by
Chvatil60
Chvatil60 wrote:

Hi, I need some help getting a phylogenetic distance matrix.

Here are two examples:

Example one:

tree=ete3.Tree('(((A,B),C),D);')
 print(tree)

         /-A
      /-|
   /-|   \-B
  |  |
--|   \-C
  |
   \-D

The matrix should then be :

    A   B   C   D
A   0   1   2   3
B   1   0   2   3
C   2   2   0   3
D   3   3   3   0

As you can see A and B are the closest leaves, then C is closer to A and B than it is to D and finally the furthest leaf is D.

Here is another more complex example 2:

tree=ete3.Tree('((((((A,B),C),D),(E,F)),G),(H,I));')
print(tree)

                  /-A
               /-|
            /-|   \-B
           |  |
         /-|   \-C
        |  |
      /-|   \-D
     |  |
     |  |   /-E
   /-|   \-|
  |  |      \-F
  |  |
--|   \-G
  |
  |   /-H
   \-|
      \-I

and here I should get the followgin matrix:

    A   B   C   D   E   F   G   H   I
A   0   1   2   3   4   4   5   6   6
B   1   0   2   3   4   4   5   6   6
C   2   2   0   3   4   4   5   6   6
D   3   3   3   0   4   4   5   6   6
E   4   4   4   4   0   1   5   6   6
F   4   4   4   4   1   0   5   6   6
G   5   5   5   5   5   5   0   6   6
H   6   6   6   6   6   6   6   0   1
I   6   6   6   6   6   6   6   1   0

I tried get_distance functions on ete3 but it does not give matrix based on node distance...

distance tree matrix phylogeny • 297 views
ADD COMMENTlink modified 12 months ago by Joe18k • written 12 months ago by Chvatil60
1
gravatar for Joe
12 months ago by
Joe18k
United Kingdom
Joe18k wrote:

If you're happy to use dendropy instead of ete3 (both are very good), then it could be done as simply as:

import dendropy
tree = dendropy.Tree.get(path='path/to/tree.tree', schema='newick') # or whatever relevant format if not newick
pdm = tree.phylogenetic_distance_matrix()
pdm.to_csv('/path/to/output.csv')
ADD COMMENTlink written 12 months ago by Joe18k

Thank you , but pdm.to_csv('/path/to/output.csv') gives AttributeError: 'PhylogeneticDistanceMatrix' object has no attribute 'to_csv' Anyway I tried :

for i, t1 in enumerate(tree.taxon_namespace[:-1]):
   for t2 in tree.taxon_namespace[i+1:]:
        print("Distance between '%s' and '%s': %s" % (t1.label, t2.label, pdm(t1, t2)))

But I get only distance of zero between leaves

Distance between 'A' and 'B': 0.0
Distance between 'A' and 'C': 0.0
Distance between 'A' and 'D': 0.0
Distance between 'A' and 'E': 0.0
Distance between 'A' and 'F': 0.0
Distance between 'A' and 'G': 0.0
Distance between 'A' and 'H': 0.0
Distance between 'A' and 'I': 0.0
Distance between 'B' and 'C': 0.0

...

ADD REPLYlink modified 12 months ago • written 12 months ago by Chvatil60

Ah sorry I think its .write_csv(). Check the package documentation.

You are getting zero distances, because your tree is topological only - it has no branch lengths. You can artificially 'fudge' this by making a cladogram of your tree, and just set all the distances to 1. Effectively your nodes have no distance in the normal sense for a tree, just hierarchical relationships.

I'm not aware of any built in functionality myself to calculate this based just off the 'rank'/'cardinality' of the nodes. It would be doable in principle by calculating the pairwise node ranks etc, but thats far more work than just faking a cladogram and using the built in methods.

ADD REPLYlink modified 12 months ago • written 12 months ago by Joe18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1203 users visited in the last hour
_