Question: get closely related leaf nodes
1
gravatar for Abdullah
4.9 years ago by
Abdullah100
Germany
Abdullah100 wrote:

Hi,

I have a newick tree.

'(((61082:1,(764031:1,((386100:1,908211:1)1:1,(764033:1,(252962:1,121494:1)1:1)1:1)1:1)1:1)1:1,((1041945:1,(908214:1,252963:1)1:1)1:1,(121492:1,((450361:1,764034:1)1:1,(908212:1,(908213:1,908215:1)1:1)1:1)1:1)1:1)1:1)1:1,(((479641:1,((1313225:1,479639:1)1:1,467775:1)1:1)1:1,(((289401:1,289398:1)1:1,((253172:1,936147:1)1:1,479643:1)1:1)1:1,(((479640:1,153946:1)1:1,(281489:1,364019:1)1:1)1:1,((((400682:1,178514:1)1:1,((178539:1,178552:1)1:1,((6052:1,681720:1)1:1,(882799:1,(289074:1,394683:1)1:1)1:1)1:1)1:1)1:1,(458493:1,(283497:1,344322:1)1:1)1:1)1:1,333317:1)1:1)1:1)1:1)1:1,((479638:1,(55567:1,233783:1)1:1)1:1,(458489:1,36754:1)1:1)1:1)1:1);'

 

I want to get for each leaf node, a list of closely related leaf nodes using ete2 python.

how can i do that ?

ete2 python • 1.2k views
ADD COMMENTlink modified 4.9 years ago by jhc2.8k • written 4.9 years ago by Abdullah100

could you explain what do you mean by "closely related"? close leaves by branch distance, topology...

ADD REPLYlink written 4.9 years ago by jhc2.8k

I want that in the sense of : how much should be the topology distance to say it is closely related or no (using : tree.get_distance(node1,node2,topology_only=True) in ete2 package in python?)

 

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Abdullah100

This will get you the number of branches that separate two nodes. The cut-off for "closely related" is up to your, and it will depend on many factors. In general, I would say that branch length is a better proxy than topological distance (so, turn off the topology_only flag). This question is somehow related: Which cut-off for collapsing this tree?

ADD REPLYlink written 4.9 years ago by jhc2.8k

I think in my tree, the branch lengths are always equal to 1..

ADD REPLYlink written 4.9 years ago by Abdullah100

Your tree seems based on NCBI taxnomy ids. A good strategy would be to group closely related leaves based on their rank in the taxonomy database (i.e. same genus/family).

I wrote some scripts to query the NCBI taxonomy tree that may be of your interest: https://github.com/jhcepas/ncbi_taxonomy

 

ADD REPLYlink written 4.9 years ago by jhc2.8k

Thank you.
Yes my Tree is a bifurcated version of the NCBI tree with leaf names are the taxonomy ids (Only the Metazoan tree)
Do you think using : python ./ncbi_query.py -t 9913 9031 9606 -x will help me get what i want?

ADD REPLYlink written 4.9 years ago by Abdullah100
1
gravatar for jhc
4.9 years ago by
jhc2.8k
Germany
jhc2.8k wrote:

Using this script you can annotate your tree using NCBI taxonomy as a reference (ncbi_query.py -x -r yourtree.nw). 

This will output a new tree in extended newick format in which all nodes contain NCBI information: species names, taxid, lineage track, rank, etc. You can then use ETE to load the tree and locate nodes matching your own criteria (i.e. rank=genus).

Note that the ncbi_taxonomy program is now unmaintained, as it has been integrated in the upcoming ETE 2.3 version 

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by jhc2.8k

Another useful tool for these taxid-based trees is the inline visualization at http://etetoolkit.org/treeview. It is currently connected to the ETE ncbi_taxonomy module and performs on-the-fly translation of tip names  

ADD REPLYlink written 4.9 years ago by jhc2.8k

if i want to use the NCBI genus/family, wouldnt be easier to do so without using a tree? since i have a tree with each leaf is the accession, i can easily get the taxonomy string from the gb file and compare the leafs that i want.

 

ADD REPLYlink written 4.9 years ago by Abdullah100

sure, this is up to you. ncbi_query.py -i -t [taxids ... ] will dump info about taxids.

ADD REPLYlink written 4.9 years ago by jhc2.8k

thanks. In all cases, what is the fastest way to get the closest leaf node that does not have a specific feature.

currently what im doing is : I loop through all the leafs of the tree, i check the ones that does not have this feature, and i record the topology distance to the node im testing (inside a dict), then i return the one with the minimum distance. However, this is quite slow. any ideas for a faster way ?

ADD REPLYlink written 4.9 years ago by Abdullah100

so no solution on this end?

ADD REPLYlink written 4.9 years ago by Abdullah100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 868 users visited in the last hour