Question: reduce Newick tree and maintain greatest diversity in subset
gravatar for voynich
4.8 years ago by
United States
voynich20 wrote:

I have several thousand sequences, which have been aligned and used to generate a Newick phylogeny tree. I want to find a smaller subset of sequences, X, such that the remaining sequences are still distributed uniformely across the phylogenic space.

So far I'm thinking about determining the number of branches at some distance from the root, until the number of branches is close to X. Then for each of these branches, I choose one tip sequence at random and delete the rest. 

I am working with Python, but any pointers would be helpful. Thanks!


tree phylogeny • 1.4k views
ADD COMMENTlink modified 4.8 years ago by Biojl1.7k • written 4.8 years ago by voynich20

Did you find a solution? Would be very interested to hear how you achieved this!

ADD REPLYlink written 4.6 years ago by ONeillMB10
gravatar for Brian Bushnell
4.8 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

This seems like it could be solved by a MST (minimum spanning tree)-like algorithm. Each iteration, find the shortest edge (distance between two leaves). Since that edge will join 2 nodes, you need to not only remove the edge, but discard one of the nodes, ideally the one closer their mutual next-closest node.

Iterate until you have the number of nodes you want. Storing edges in a heap (or priority queue) is often useful in this kind of scenario.

ADD COMMENTlink modified 8 months ago by RamRS28k • written 4.8 years ago by Brian Bushnell17k
gravatar for Biojl
4.8 years ago by
Biojl1.7k wrote:

You can work it out with ete2:

It's python based and will let you parse nodes and leaves and calculate all sort of measures, it has some very interesting built-in functions.

ADD COMMENTlink written 4.8 years ago by Biojl1.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1483 users visited in the last hour