Question: How do I generate all possible Newick Tree permutations for a set of species given an outgroup in Python?
2
gravatar for anon
17 months ago by
anon50
anon50 wrote:

Edit: A possible answer has been posted on SO. I need to verify it myself but it looks correct at the moment. The answer is here at: https://stackoverflow.com/a/46670077/5609349

How do I generate all possible Newick Tree permutations for a set of species given an outgroup?

For those who don't know what Newick tree format is, a good description is available at: https://en.wikipedia.org/wiki/Newick_format

I want to create all possible Newick Tree permutations for a set of species given an outgroup. The number of possible leaf nodes is either 4, 5, or 6.

Both "Soft" and "hard" polytomies are allowed. See: https://en.wikipedia.org/wiki/Polytomy#Soft_polytomies_vs._hard_polytomies https://biology.stackexchange.com/questions/23667/evidence-discussions-of-hard-polytomy

Shown below is the ideal output, with "E" set as the outgroup

Ideal Output:

((("A","B","C"),("D"),("E"));
((("A","B","D"),("C"),("E"));
((("A","C","D"),("B"),("E"));
((("B","C","D"),("A"),("E"));
((("A","B")("C","D"),("E"));
((("A","C")("B","D"),("E"));
((("B","C")("A","D"),("E"));
(("A","B","C","D"),("E"));
(((("A","B"),"C"),"D"),("E"));

However, any possible solutions that I've come with using itertools, specifically itertools.permutations, have come across the problem of equivalent output. The last idea I came up with involved the equivalent output that is shown below and I think a solution would definitely involve set.

Equivalent output:

((("C","B","A"),("D"),("E"));
((("C","A","B"),("D"),("E"));
((("A","C","B"),("D"),("E"));

Here is the start of my idea for a solution. However, I'm not really sure what to about this problem besides itertools and set for now.

import itertools

def Newick_Permutation_Generator(list_of_species, name_of_outgroup)
    permutations_list =(list(itertools.permutations(["A","B","C","D","E"])))

    for given_permutation in permutations_list:
        process(given_permutation)

Newick_Permutation_Generator(["A","B","C","D","E"], "E")
ADD COMMENTlink modified 17 months ago • written 17 months ago by anon50
1

Hello TheRealMrDoe!

It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/46626414

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 17 months ago by Pierre Lindenbaum118k
1

Yes, that is correct. However, I didn't receive many replies on my post on stackoverflow so I decided to post here. Also, I expected that more people would here would be familiar with Newick trees.

ADD REPLYlink written 17 months ago by anon50
1

Before you get too involved you need to make sure what you're planning is practical, as you get a lot of bifurcating trees from surprisingly few leaf nodes (and if it's bifurcating + others then it's even worse). If your plan is to do some compute for goodness of fit etc on all of these, then you might need to think carefully about it! With just 10 leaves, you'll be testing 34 million bifurcating trees...

ADD REPLYlink modified 17 months ago • written 17 months ago by george.ry1.1k

That's true. I don't plan to do goodness of fit on all of these. I'm just looking to generate permutations of Newick trees.

ADD REPLYlink written 17 months ago by anon50
1

For how many leaf nodes then, outside of this example dataset?

ADD REPLYlink written 17 months ago by george.ry1.1k

4 and 6 nodes outside of this example dataset.

ADD REPLYlink written 17 months ago by anon50
1

Are polytomies allowed?

ADD REPLYlink written 17 months ago by jrj.healey11k

No, polytomies are not allowed.

ADD REPLYlink written 17 months ago by anon50
3

The "ideal output" in your question includes polytomies. Take a look at this answer: https://stackoverflow.com/questions/17242117/what-python-code-generates-all-possible-groupings-trees-for-binary-operators.

ADD REPLYlink modified 17 months ago • written 17 months ago by a.zielezinski8.6k
1

I think a.zielezinski is also getting at the same problem that I was pushing towards.. in your idealised output you're allowing polytomies. It sounds like you're just interested in generating all possible trees as strings based purely on randomness, so the distinction between hard and soft polytomies doesn't matter if there is no actual data to base this on?

If we say that all polytomies are not allowed, you'll solve a few of your equivalencies but generate new ones.

e.g.

((("C","B","A"),("D"),("E"));
((("C","A","B"),("D"),("E"));
((("A","C","B"),("D"),("E"));

Should resolve in to 3 distinct trees per polytomy (if my quick mental sums are right). This will give you a lot more trees potentially, but you'll also have to deal with the equivalency of something like:

(((("C","A"),("B"),("D"),("E"));
(((("A","C"),("B"),("D"),("E"));

If you look at the link provided, based on your current data, you couldn't say if you had a hard polytomy or not, since these are just permutations, not based on actual phylogenetic information (unless I've missed something).

ADD REPLYlink modified 17 months ago • written 17 months ago by jrj.healey11k

I see. Thank you for the clarification. Polytomies will be allowed in my output then. Do you have any suggestions for methods for generating the trees?

Best, Andrew Livera

ADD REPLYlink modified 17 months ago • written 17 months ago by anon50

Does your code output anything at the moment? How close is it to what you wanted?

ADD REPLYlink written 17 months ago by jrj.healey11k
1

Nevermind, it looks like you got a fantastic answer over on SO.

ADD REPLYlink written 17 months ago by jrj.healey11k

For anyone else who wants to see the answer, check it out here at: https://stackoverflow.com/a/46670077/5609349

ADD REPLYlink written 17 months ago by anon50

Note that, for some reason, this solution does not seem to generate all possible results when polytomies are allowed. I found more topologies with this approach: https://stackoverflow.com/a/46707857/1878788]

ADD REPLYlink written 17 months ago by blaise.li__biostars50

That's right. Let me clarify. "Soft" polytomies are allowed but "hard" polytomies are not allowed.

I'm still a little confused on "soft" and "hard polytomies. Are both types present in the "ideal output" of my question? I think I only had "soft" polytomies in my "ideal output."

See: https://en.wikipedia.org/wiki/Polytomy#Soft_polytomies_vs._hard_polytomies https://biology.stackexchange.com/questions/23667/evidence-discussions-of-hard-polytomy

ADD REPLYlink written 17 months ago by anon50

Also, I looked at this answer: https://stackoverflow.com/questions/17242117/what-python-code-generates-all-possible-groupings-trees-for-binary-operators

It's only useful for numbers and strings, not lists.

ADD REPLYlink written 17 months ago by anon50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1719 users visited in the last hour