Newick trees analysis software
2
0
Entering edit mode
5.1 years ago
roussine ▴ 10

Hello everyone-

please take a min to answer in case you might know. My need is to output basic parameters of text newick trees: branch lengths and node supports (all possibly with average values or other basic statistics, but it is not a priority). Is there a nice pipelinable soft around to do so? - before I need to go into scripting, The trees are very many, so pipelining is essential. Both Unix or Win is ok.

Thanks much in advance, - Leo

newick tree analysis software • 3.1k views
ADD COMMENT
0
Entering edit mode

How do you need the output? As labels within a plotted tree? You might want to consider that Newick format is not ideal for this, because it is not set that the values you need even are in there. All depends on the software writing the output. If you could provide a little example it might help, but this format is rather easy to parse.

ADD REPLY
0
Entering edit mode

Michael – thanks for your response. The need is just a text output per each text tree: all branch lengths (or average), all node supports (or average). The values not need be in one file. Extracting these values from trees is now the step that would require ad hoc scripting if a nice pipelinable soft is missing around.. I do not have an example tree at hand for the moment, but those are nice standard newicks output by FastTree.

ADD REPLY
0
Entering edit mode

If you need the branch lengths, they're already encoded inside the newick format - what do you want to do with the values?

Here's some code I wrote a while back to work out distances in trees:

https://github.com/jrjhealey/bioinfo-tools/blob/master/tree_dists.py

You can use it like so, to get the pairwise distances between all tips: python tree_dists.py -m all -s newick -i mytreefile.tree.

You can also use it like so, to get the distance between the 2 most distant tips: python tree_dists.py -m max -s newick -i mytreefile.tree. The max is default, so you can also run this without -m to get the same result. If it's useful, I'll be happy to edit the code to alter the output formats or to provide other calculation options.

ADD REPLY
0
Entering edit mode

Dear jrj.healey – my need for this case is rather simple: to basically parse newick and extract numerical values of branch lengths and node support. I will then do basic statistics to assess trees and bin them further. This is a constitutive step of our phylogenomic approach to analyse orthology groups. Did you think of making your script extract such values?

ADD REPLY
0
Entering edit mode

I can look in to it. It shouldnt be difficult. ETE3s object model stores nodes with their associated values I believe.

Could you mock up some example input and output you'd expect?

ADD REPLY
0
Entering edit mode

Ok, example in-outs are like this.

Input would be a standard newick:

((logi|XP_009052348.1:0.30900,(dapu|EFX67985.1:0.40918,dare|NP_001007771.1:0.18921)0.580:0.08422)0.826:0.09733,cate|ELT98251.1:0.29370,(lian|XP_013420576.1:0.18354,((ocbi|XP_014783723.1:0.22136,(neve|XP_001634838.1:1.09355,(scma|XP_018652019.1:0.58808,ecmu|CDS40328.1:0.89059)0.920:0.47871)0.738:0.11872)0.572:0.03167,hero|XP_009022332.1:0.79732)0.005:0.06582)0.790:0.07221);

Output would be plain text columns:

Tree 1 (= treefile name)    
br_lens
0.30900
0.40918
…
AVERAGE: …
MEDIAN: …

node_support
0.580
0.826
…
AVERAGE: …
MEDIAN: …

And so on for each of n (many thousands) trees. If it all goes to one or separate files – whatever is easier to implement. Please let me know if I made sense.

ADD REPLY
4
Entering edit mode
5.1 years ago
Joe 21k

Ok, I haven't worked it in to my other code just yet, but here's some approaches you could use based around ete3:

from ete3 import Tree
import sys
from statistics import median

with open(sys.argv[1], 'r') as handle:
    t = Tree(handle.readline())

nodes = [node for node in t.traverse()]
# Get all branch lengths:
print('Tree = {}'.format(str(sys.argv[1])))
print('br_lens')
for node in nodes:
    print(node.dist)
print('AVERAGE: {}'.format(float(sum([node.dist for node in nodes])/len(nodes))))
print('MEDIAN: {}'.format(median([node.dist for node in nodes])))

# Support is basically a case of doing the same as the above.
print('\n')
print('node_support')
for node in nodes:
    print(node.support)
print('AVERAGE: {}'.format(float(sum([node.support for node in nodes])/len(nodes))))
print('MEDIAN: {}'.format(median([node.support for node in nodes])))

Given the input as bs.tree:

$ cat bs.tree
((logi|XP_009052348.1:0.30900,(dapu|EFX67985.1:0.40918,dare|NP_001007771.1:0.18921)0.580:0.08422)0.826:0.09733,cate|ELT98251.1:0.29370,(lian|XP_013420576.1:0.18354,((ocbi|XP_014783723.1:0.22136,(neve|XP_001634838.1:1.09355,(scma|XP_018652019.1:0.58808,ecmu|CDS40328.1:0.89059)0.920:0.47871)0.738:0.11872)0.572:0.03167,hero|XP_009022332.1:0.79732)0.005:0.06582)0.790:0.07221);

$ python3 script.py bs.tree

Tree = bs.tree
br_lens
0.0
0.09733
0.2937
0.07221
0.309
0.08422
0.18354
0.06582
0.40918
0.18921
0.03167
0.79732
0.22136
0.11872
1.09355
0.47871
0.58808
0.89059
AVERAGE: 0.3291227777777778
MEDIAN: 0.205285


node_support
1.0
0.826
1.0
0.79
1.0
0.58
1.0
0.005
1.0
1.0
0.572
1.0
1.0
0.738
1.0
0.92
1.0
1.0
AVERAGE: 0.8572777777777777
MEDIAN: 1.0

It's not the most elegant code in the world (it could probably be refactored to a function rather than loads of printing and list comprehensions) but hopefully that's close enough to what you need to suffice.

If you want to apply it to lots of trees I'd suggest doing something like:

$ for tree in *.tree ; do python3 script.py "$file" > "${file%.*}"_output.txt ; done

(or look in to parallel processing with GNU parallel or similar).

ADD COMMENT
0
Entering edit mode

Dear jrj.healey - great thanks for assistance and getting into this. I tried the script, python returns this:

SyntaxError: invalid syntax
$ python3 br_lens+n_supp.py tree.tre                 [ 3:21PM]
  File "br_lens+n_supp.py", line 6
    print(f'Tree = {sys.argv[1]}')

Might be version dependent.. Could you comment?
Thanks a lot

ADD REPLY
0
Entering edit mode

That's a strange error for sure. I would have expected it to be a little more infomative (usually it has an arrow depicting the issue). What version of python are you using? I think fstrings only appeared in 3.6 and later (I'm using 3.6.8 at the moment).

ADD REPLY
0
Entering edit mode

It's Python 3.4.3 (default, Nov 12 2018, 22:25:49) on BioLinix 8. And yes - there is an arrow:

    print(f'Tree = {sys.argv[1]}')                                 ^

The code above doesn't show correctly: the arrow points to the last single quotation.

ADD REPLY
0
Entering edit mode

Yes, this is a python version error then. Can you upgrade to 3.6 or higher?

ADD REPLY
0
Entering edit mode

I've updated the code in the answer post. I've dropped the use of fstrings, but you will still need to use Python3 as it needs the statistics module. It should run on <3.6 now however.

I also noticed I was missing the division from my average calculations, so I've updated that and the new output I get.

ADD REPLY
0
Entering edit mode

Thank you - while I was fiddling with data, python3 returns:

Traceback (most recent call last):
  File "br_lens+n_supp.py", line 1, in <module>
    from ete3 import Tree
ImportError: No module named 'ete3'

ete3 is installed however at /usr/local/bin/ete3

Please let me know what am I missing?..

ADD REPLY
0
Entering edit mode

its probably a PYTHONPATH issue.

Try:

$ python3 -m pip install ete3

Then re-run and see if that works.

(Or better yet, switch to using python via conda and the process is even easier)

ADD REPLY
1
Entering edit mode

Thank you indeed for valuable help. -- It all just works.

ADD REPLY
1
Entering edit mode
5.1 years ago
Michael 54k

Can you use Python? Then the ETE library should be ok for you, see http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#reading-and-writing-newick-trees . There is also the R package ape that can do that.

Please let me know if that is sufficient or if you need more guidance based on a concrete example of input and output.

ADD COMMENT
1
Entering edit mode

Thank you, and thank you to jrj.healey for making a working script. I am not a python guy so I couldn't have done it.

ADD REPLY
0
Entering edit mode

Michael – thanks indeed for your involvement. I will go through your suggestions in nearest time.

ADD REPLY

Login before adding your answer.

Traffic: 2324 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6