Question: Get distance matrix based on node tree
0
gravatar for Chvatil
6 months ago by
Chvatil50
Chvatil50 wrote:

Hi, I need some help getting a phylogenetic distance matrix.

Here are two examples:

Example one:

tree=ete3.Tree('(((A,B),C),D);')
 print(tree)

         /-A
      /-|
   /-|   \-B
  |  |
--|   \-C
  |
   \-D

The matrix should then be :

    A   B   C   D
A   0   1   2   3
B   1   0   2   3
C   2   2   0   3
D   3   3   3   0

As you can see A and B are the closest leaves, then C is closer to A and B than it is to D and finally the furthest leaf is D.

Here is another more complex example 2:

tree=ete3.Tree('((((((A,B),C),D),(E,F)),G),(H,I));')
print(tree)

                  /-A
               /-|
            /-|   \-B
           |  |
         /-|   \-C
        |  |
      /-|   \-D
     |  |
     |  |   /-E
   /-|   \-|
  |  |      \-F
  |  |
--|   \-G
  |
  |   /-H
   \-|
      \-I

and here I should get the followgin matrix:

    A   B   C   D   E   F   G   H   I
A   0   1   2   3   4   4   5   6   6
B   1   0   2   3   4   4   5   6   6
C   2   2   0   3   4   4   5   6   6
D   3   3   3   0   4   4   5   6   6
E   4   4   4   4   0   1   5   6   6
F   4   4   4   4   1   0   5   6   6
G   5   5   5   5   5   5   0   6   6
H   6   6   6   6   6   6   6   0   1
I   6   6   6   6   6   6   6   1   0

I tried get_distance functions on ete3 but it does not give matrix based on node distance...

distance tree matrix phylogeny • 171 views
ADD COMMENTlink modified 6 months ago by Joe17k • written 6 months ago by Chvatil50
1
gravatar for Joe
6 months ago by
Joe17k
United Kingdom
Joe17k wrote:

If you're happy to use dendropy instead of ete3 (both are very good), then it could be done as simply as:

import dendropy
tree = dendropy.Tree.get(path='path/to/tree.tree', schema='newick') # or whatever relevant format if not newick
pdm = tree.phylogenetic_distance_matrix()
pdm.to_csv('/path/to/output.csv')
ADD COMMENTlink written 6 months ago by Joe17k

Thank you , but pdm.to_csv('/path/to/output.csv') gives AttributeError: 'PhylogeneticDistanceMatrix' object has no attribute 'to_csv' Anyway I tried :

for i, t1 in enumerate(tree.taxon_namespace[:-1]):
   for t2 in tree.taxon_namespace[i+1:]:
        print("Distance between '%s' and '%s': %s" % (t1.label, t2.label, pdm(t1, t2)))

But I get only distance of zero between leaves

Distance between 'A' and 'B': 0.0
Distance between 'A' and 'C': 0.0
Distance between 'A' and 'D': 0.0
Distance between 'A' and 'E': 0.0
Distance between 'A' and 'F': 0.0
Distance between 'A' and 'G': 0.0
Distance between 'A' and 'H': 0.0
Distance between 'A' and 'I': 0.0
Distance between 'B' and 'C': 0.0

...

ADD REPLYlink modified 6 months ago • written 6 months ago by Chvatil50

Ah sorry I think its .write_csv(). Check the package documentation.

You are getting zero distances, because your tree is topological only - it has no branch lengths. You can artificially 'fudge' this by making a cladogram of your tree, and just set all the distances to 1. Effectively your nodes have no distance in the normal sense for a tree, just hierarchical relationships.

I'm not aware of any built in functionality myself to calculate this based just off the 'rank'/'cardinality' of the nodes. It would be doable in principle by calculating the pairwise node ranks etc, but thats far more work than just faking a cladogram and using the built in methods.

ADD REPLYlink modified 6 months ago • written 6 months ago by Joe17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 696 users visited in the last hour