Question: get closely related leaf nodes
1
5.3 years ago by
Abdullah100
Germany
Abdullah100 wrote:

Hi,

I have a newick tree.

`'(((61082:1,(764031:1,((386100:1,908211:1)1:1,(764033:1,(252962:1,121494:1)1:1)1:1)1:1)1:1)1:1,((1041945:1,(908214:1,252963:1)1:1)1:1,(121492:1,((450361:1,764034:1)1:1,(908212:1,(908213:1,908215:1)1:1)1:1)1:1)1:1)1:1)1:1,(((479641:1,((1313225:1,479639:1)1:1,467775:1)1:1)1:1,(((289401:1,289398:1)1:1,((253172:1,936147:1)1:1,479643:1)1:1)1:1,(((479640:1,153946:1)1:1,(281489:1,364019:1)1:1)1:1,((((400682:1,178514:1)1:1,((178539:1,178552:1)1:1,((6052:1,681720:1)1:1,(882799:1,(289074:1,394683:1)1:1)1:1)1:1)1:1)1:1,(458493:1,(283497:1,344322:1)1:1)1:1)1:1,333317:1)1:1)1:1)1:1)1:1,((479638:1,(55567:1,233783:1)1:1)1:1,(458489:1,36754:1)1:1)1:1)1:1);'`

I want to get for each leaf node, a list of closely related leaf nodes using ete2 python.

how can i do that ?

ete2 python • 1.3k views
modified 5.3 years ago by jhc2.8k • written 5.3 years ago by Abdullah100

could you explain what do you mean by "closely related"? close leaves by branch distance, topology...

I want that in the sense of : how much should be the topology distance to say it is closely related or no (using : `tree.get_distance(node1,node2,topology_only=True) `in ete2 package in python?)

This will get you the number of branches that separate two nodes. The cut-off for "closely related" is up to your, and it will depend on many factors. In general, I would say that branch length is a better proxy than topological distance (so, turn off the topology_only flag). This question is somehow related: Which cut-off for collapsing this tree?

I think in my tree, the branch lengths are always equal to 1..

Your tree seems based on NCBI taxnomy ids. A good strategy would be to group closely related leaves based on their rank in the taxonomy database (i.e. same genus/family).

I wrote some scripts to query the NCBI taxonomy tree that may be of your interest: https://github.com/jhcepas/ncbi_taxonomy

Thank you.
Yes my Tree is a bifurcated version of the NCBI tree with leaf names are the taxonomy ids (Only the Metazoan tree)
Do you think using : `python ./ncbi_query.py -t 9913 9031 9606 -x` will help me get what i want?

1
5.3 years ago by
jhc2.8k
Spain
jhc2.8k wrote:

Using this script you can annotate your tree using NCBI taxonomy as a reference (ncbi_query.py -x -r yourtree.nw).

This will output a new tree in extended newick format in which all nodes contain NCBI information: species names, taxid, lineage track, rank, etc. You can then use ETE to load the tree and locate nodes matching your own criteria (i.e. rank=genus).

Note that the ncbi_taxonomy program is now unmaintained, as it has been integrated in the upcoming ETE 2.3 version

Another useful tool for these taxid-based trees is the inline visualization at http://etetoolkit.org/treeview. It is currently connected to the ETE ncbi_taxonomy module and performs on-the-fly translation of tip names

if i want to use the NCBI genus/family, wouldnt be easier to do so without using a tree? since i have a tree with each leaf is the accession, i can easily get the taxonomy string from the gb file and compare the leafs that i want.

sure, this is up to you. ncbi_query.py -i -t [taxids ... ] will dump info about taxids.

thanks. In all cases, what is the fastest way to get the closest leaf node that does not have a specific feature.

currently what im doing is : I loop through all the leafs of the tree, i check the ones that does not have this feature, and i record the topology distance to the node im testing (inside a dict), then i return the one with the minimum distance. However, this is quite slow. any ideas for a faster way ?