Question: Get distance matrix based on node tree
0
Chvatil50 wrote:

Hi, I need some help getting a phylogenetic distance matrix.

Here are two examples:

Example one:

``````tree=ete3.Tree('(((A,B),C),D);')
print(tree)

/-A
/-|
/-|   \-B
|  |
--|   \-C
|
\-D
``````

The matrix should then be :

``````    A   B   C   D
A   0   1   2   3
B   1   0   2   3
C   2   2   0   3
D   3   3   3   0
``````

As you can see A and B are the closest leaves, then C is closer to A and B than it is to D and finally the furthest leaf is D.

Here is another more complex example 2:

``````tree=ete3.Tree('((((((A,B),C),D),(E,F)),G),(H,I));')
print(tree)

/-A
/-|
/-|   \-B
|  |
/-|   \-C
|  |
/-|   \-D
|  |
|  |   /-E
/-|   \-|
|  |      \-F
|  |
--|   \-G
|
|   /-H
\-|
\-I
``````

and here I should get the followgin matrix:

``````    A   B   C   D   E   F   G   H   I
A   0   1   2   3   4   4   5   6   6
B   1   0   2   3   4   4   5   6   6
C   2   2   0   3   4   4   5   6   6
D   3   3   3   0   4   4   5   6   6
E   4   4   4   4   0   1   5   6   6
F   4   4   4   4   1   0   5   6   6
G   5   5   5   5   5   5   0   6   6
H   6   6   6   6   6   6   6   0   1
I   6   6   6   6   6   6   6   1   0
``````

I tried get_distance functions on ete3 but it does not give matrix based on node distance...

distance tree matrix phylogeny • 171 views
modified 6 months ago by Joe17k • written 6 months ago by Chvatil50
1
Joe17k wrote:

If you're happy to use `dendropy` instead of `ete3` (both are very good), then it could be done as simply as:

``````import dendropy
tree = dendropy.Tree.get(path='path/to/tree.tree', schema='newick') # or whatever relevant format if not newick
pdm = tree.phylogenetic_distance_matrix()
pdm.to_csv('/path/to/output.csv')
``````

Thank you , but `pdm.to_csv('/path/to/output.csv')` gives `AttributeError: 'PhylogeneticDistanceMatrix' object has no attribute 'to_csv'` Anyway I tried :

``````for i, t1 in enumerate(tree.taxon_namespace[:-1]):
for t2 in tree.taxon_namespace[i+1:]:
print("Distance between '%s' and '%s': %s" % (t1.label, t2.label, pdm(t1, t2)))
``````

But I get only distance of zero between leaves

``````Distance between 'A' and 'B': 0.0
Distance between 'A' and 'C': 0.0
Distance between 'A' and 'D': 0.0
Distance between 'A' and 'E': 0.0
Distance between 'A' and 'F': 0.0
Distance between 'A' and 'G': 0.0
Distance between 'A' and 'H': 0.0
Distance between 'A' and 'I': 0.0
Distance between 'B' and 'C': 0.0
``````

...

Ah sorry I think its `.write_csv()`. Check the package documentation.

You are getting zero distances, because your tree is topological only - it has no branch lengths. You can artificially 'fudge' this by making a cladogram of your tree, and just set all the distances to `1`. Effectively your nodes have no distance in the normal sense for a tree, just hierarchical relationships.

I'm not aware of any built in functionality myself to calculate this based just off the 'rank'/'cardinality' of the nodes. It would be doable in principle by calculating the pairwise node ranks etc, but thats far more work than just faking a cladogram and using the built in methods.