Get distance matrix based on node tree
1
0
Entering edit mode
2.5 years ago
Chvatil ▴ 90

Hi, I need some help getting a phylogenetic distance matrix.

Here are two examples:

Example one:

tree=ete3.Tree('(((A,B),C),D);')
print(tree)

/-A
/-|
/-|   \-B
|  |
--|   \-C
|
\-D


The matrix should then be :

    A   B   C   D
A   0   1   2   3
B   1   0   2   3
C   2   2   0   3
D   3   3   3   0


As you can see A and B are the closest leaves, then C is closer to A and B than it is to D and finally the furthest leaf is D.

Here is another more complex example 2:

tree=ete3.Tree('((((((A,B),C),D),(E,F)),G),(H,I));')
print(tree)

/-A
/-|
/-|   \-B
|  |
/-|   \-C
|  |
/-|   \-D
|  |
|  |   /-E
/-|   \-|
|  |      \-F
|  |
--|   \-G
|
|   /-H
\-|
\-I


and here I should get the followgin matrix:

    A   B   C   D   E   F   G   H   I
A   0   1   2   3   4   4   5   6   6
B   1   0   2   3   4   4   5   6   6
C   2   2   0   3   4   4   5   6   6
D   3   3   3   0   4   4   5   6   6
E   4   4   4   4   0   1   5   6   6
F   4   4   4   4   1   0   5   6   6
G   5   5   5   5   5   5   0   6   6
H   6   6   6   6   6   6   6   0   1
I   6   6   6   6   6   6   6   1   0


I tried get_distance functions on ete3 but it does not give matrix based on node distance...

tree distance matrix phylogeny • 1.5k views
1
Entering edit mode
2.5 years ago
Joe 20k

If you're happy to use dendropy instead of ete3 (both are very good), then it could be done as simply as:

import dendropy
tree = dendropy.Tree.get(path='path/to/tree.tree', schema='newick') # or whatever relevant format if not newick
pdm = tree.phylogenetic_distance_matrix()
pdm.to_csv('/path/to/output.csv')

0
Entering edit mode

Thank you , but pdm.to_csv('/path/to/output.csv') gives AttributeError: 'PhylogeneticDistanceMatrix' object has no attribute 'to_csv' Anyway I tried :

for i, t1 in enumerate(tree.taxon_namespace[:-1]):
for t2 in tree.taxon_namespace[i+1:]:
print("Distance between '%s' and '%s': %s" % (t1.label, t2.label, pdm(t1, t2)))


But I get only distance of zero between leaves

Distance between 'A' and 'B': 0.0
Distance between 'A' and 'C': 0.0
Distance between 'A' and 'D': 0.0
Distance between 'A' and 'E': 0.0
Distance between 'A' and 'F': 0.0
Distance between 'A' and 'G': 0.0
Distance between 'A' and 'H': 0.0
Distance between 'A' and 'I': 0.0
Distance between 'B' and 'C': 0.0


...

0
Entering edit mode

Ah sorry I think its .write_csv(). Check the package documentation.

You are getting zero distances, because your tree is topological only - it has no branch lengths. You can artificially 'fudge' this by making a cladogram of your tree, and just set all the distances to 1. Effectively your nodes have no distance in the normal sense for a tree, just hierarchical relationships.

I'm not aware of any built in functionality myself to calculate this based just off the 'rank'/'cardinality' of the nodes. It would be doable in principle by calculating the pairwise node ranks etc, but thats far more work than just faking a cladogram and using the built in methods.