get closely related leaf nodes
1
1
Entering edit mode
9.9 years ago
Abdullah ▴ 100

Hi,

I have a newick tree.

'(((61082:1,(764031:1,((386100:1,908211:1)1:1,(764033:1,(252962:1,121494:1)1:1)1:1)1:1)1:1)1:1,((1041945:1,(908214:1,252963:1)1:1)1:1,(121492:1,((450361:1,764034:1)1:1,(908212:1,(908213:1,908215:1)1:1)1:1)1:1)1:1)1:1)1:1,(((479641:1,((1313225:1,479639:1)1:1,467775:1)1:1)1:1,(((289401:1,289398:1)1:1,((253172:1,936147:1)1:1,479643:1)1:1)1:1,(((479640:1,153946:1)1:1,(281489:1,364019:1)1:1)1:1,((((400682:1,178514:1)1:1,((178539:1,178552:1)1:1,((6052:1,681720:1)1:1,(882799:1,(289074:1,394683:1)1:1)1:1)1:1)1:1)1:1,(458493:1,(283497:1,344322:1)1:1)1:1)1:1,333317:1)1:1)1:1)1:1)1:1,((479638:1,(55567:1,233783:1)1:1)1:1,(458489:1,36754:1)1:1)1:1)1:1);'

I want to get for each leaf node, a list of closely related leaf nodes using ete2 python.

how can I do that?

python ete2 • 3.0k views
ADD COMMENT
0
Entering edit mode

could you explain what do you mean by "closely related"? close leaves by branch distance, topology...

ADD REPLY
0
Entering edit mode

I want that in the sense of : how much should be the topology distance to say it is closely related or no (using : tree.get_distance(node1,node2,topology_only=True) in ete2 package in python?)

ADD REPLY
0
Entering edit mode

This will get you the number of branches that separate two nodes. The cut-off for "closely related" is up to your, and it will depend on many factors. In general, I would say that branch length is a better proxy than topological distance (so, turn off the topology_only flag). This question is somehow related: Which cut-off for collapsing this tree?

ADD REPLY
0
Entering edit mode

I think in my tree, the branch lengths are always equal to 1..

ADD REPLY
0
Entering edit mode

Your tree seems based on NCBI taxnomy ids. A good strategy would be to group closely related leaves based on their rank in the taxonomy database (i.e. same genus/family).

I wrote some scripts to query the NCBI taxonomy tree that may be of your interest: https://github.com/jhcepas/ncbi_taxonomy

ADD REPLY
0
Entering edit mode

Thank you.
Yes my Tree is a bifurcated version of the NCBI tree with leaf names are the taxonomy ids (Only the Metazoan tree)
Do you think using : python ./ncbi_query.py -t 9913 9031 9606 -x will help me get what i want?

ADD REPLY
1
Entering edit mode
9.9 years ago
jhc ★ 3.0k

Using this script you can annotate your tree using NCBI taxonomy as a reference (ncbi_query.py -x -r yourtree.nw).

This will output a new tree in extended newick format in which all nodes contain NCBI information: species names, taxid, lineage track, rank, etc. You can then use ETE to load the tree and locate nodes matching your own criteria (i.e. rank=genus).

Note that the ncbi_taxonomy program is now unmaintained, as it has been integrated in the upcoming ETE 2.3 version

ADD COMMENT
0
Entering edit mode

Another useful tool for these taxid-based trees is the inline visualization at http://etetoolkit.org/treeview. It is currently connected to the ETE ncbi_taxonomy module and performs on-the-fly translation of tip names

ADD REPLY
0
Entering edit mode

if I want to use the NCBI genus/family, wouldn't be easier to do so without using a tree? since I have a tree with each leaf is the accession, I can easily get the taxonomy string from the gb file and compare the leafs that I want.

ADD REPLY
0
Entering edit mode

Sure, this is up to you. ncbi_query.py -i -t [taxids ... ] will dump info about taxids.

ADD REPLY
0
Entering edit mode

Thanks. In all cases, what is the fastest way to get the closest leaf node that does not have a specific feature.

Currently what I'm doing is : I loop through all the leafs of the tree, I check the ones that does not have this feature, and I record the topology distance to the node I'm testing (inside a dict), then I return the one with the minimum distance. However, this is quite slow. any ideas for a faster way?

ADD REPLY
0
Entering edit mode

So no solution on this end?

ADD REPLY

Login before adding your answer.

Traffic: 2037 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6