How do I generate all possible Newick Tree permutations for a set of species given an outgroup in Python?
0
2
Entering edit mode
4.9 years ago

Edit: A possible answer has been posted on SO. I need to verify it myself but it looks correct at the moment. The answer is here at: https://stackoverflow.com/a/46670077/5609349

How do I generate all possible Newick Tree permutations for a set of species given an outgroup?

For those who don't know what Newick tree format is, a good description is available at: https://en.wikipedia.org/wiki/Newick_format

I want to create all possible Newick Tree permutations for a set of species given an outgroup. The number of possible leaf nodes is either 4, 5, or 6.

Both "Soft" and "hard" polytomies are allowed. See: https://en.wikipedia.org/wiki/Polytomy#Soft_polytomies_vs._hard_polytomies https://biology.stackexchange.com/questions/23667/evidence-discussions-of-hard-polytomy

Shown below is the ideal output, with "E" set as the outgroup

Ideal Output:

((("A","B","C"),("D"),("E"));
((("A","B","D"),("C"),("E"));
((("A","C","D"),("B"),("E"));
((("B","C","D"),("A"),("E"));
((("A","B")("C","D"),("E"));
((("A","C")("B","D"),("E"));
((("B","C")("A","D"),("E"));
(("A","B","C","D"),("E"));
(((("A","B"),"C"),"D"),("E"));


However, any possible solutions that I've come with using itertools, specifically itertools.permutations, have come across the problem of equivalent output. The last idea I came up with involved the equivalent output that is shown below and I think a solution would definitely involve set.

Equivalent output:

((("C","B","A"),("D"),("E"));
((("C","A","B"),("D"),("E"));
((("A","C","B"),("D"),("E"));


Here is the start of my idea for a solution. However, I'm not really sure what to about this problem besides itertools and set for now.

import itertools

def Newick_Permutation_Generator(list_of_species, name_of_outgroup)
permutations_list =(list(itertools.permutations(["A","B","C","D","E"])))

for given_permutation in permutations_list:
process(given_permutation)

Newick_Permutation_Generator(["A","B","C","D","E"], "E")

Python Newick phylogenetics tree phylogeny • 3.5k views
1
Entering edit mode

Hello TheRealMrDoe!

It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/46626414

This is typically not recommended as it runs the risk of annoying people in both communities.

1
Entering edit mode

Yes, that is correct. However, I didn't receive many replies on my post on stackoverflow so I decided to post here. Also, I expected that more people would here would be familiar with Newick trees.

1
Entering edit mode

Before you get too involved you need to make sure what you're planning is practical, as you get a lot of bifurcating trees from surprisingly few leaf nodes (and if it's bifurcating + others then it's even worse). If your plan is to do some compute for goodness of fit etc on all of these, then you might need to think carefully about it! With just 10 leaves, you'll be testing 34 million bifurcating trees...

0
Entering edit mode

That's true. I don't plan to do goodness of fit on all of these. I'm just looking to generate permutations of Newick trees.

1
Entering edit mode

For how many leaf nodes then, outside of this example dataset?

0
Entering edit mode

4 and 6 nodes outside of this example dataset.

1
Entering edit mode

Are polytomies allowed?

0
Entering edit mode

No, polytomies are not allowed.

3
Entering edit mode

The "ideal output" in your question includes polytomies. Take a look at this answer: https://stackoverflow.com/questions/17242117/what-python-code-generates-all-possible-groupings-trees-for-binary-operators.

1
Entering edit mode

I think a.zielezinski is also getting at the same problem that I was pushing towards.. in your idealised output you're allowing polytomies. It sounds like you're just interested in generating all possible trees as strings based purely on randomness, so the distinction between hard and soft polytomies doesn't matter if there is no actual data to base this on?

If we say that all polytomies are not allowed, you'll solve a few of your equivalencies but generate new ones.

e.g.

((("C","B","A"),("D"),("E"));
((("C","A","B"),("D"),("E"));
((("A","C","B"),("D"),("E"));


Should resolve in to 3 distinct trees per polytomy (if my quick mental sums are right). This will give you a lot more trees potentially, but you'll also have to deal with the equivalency of something like:

(((("C","A"),("B"),("D"),("E"));
(((("A","C"),("B"),("D"),("E"));


If you look at the link provided, based on your current data, you couldn't say if you had a hard polytomy or not, since these are just permutations, not based on actual phylogenetic information (unless I've missed something).

0
Entering edit mode

I see. Thank you for the clarification. Polytomies will be allowed in my output then. Do you have any suggestions for methods for generating the trees?

0
Entering edit mode

Does your code output anything at the moment? How close is it to what you wanted?

1
Entering edit mode

Nevermind, it looks like you got a fantastic answer over on SO.

0
Entering edit mode

For anyone else who wants to see the answer, check it out here at: https://stackoverflow.com/a/46670077/5609349

0
Entering edit mode

Note that, for some reason, this solution does not seem to generate all possible results when polytomies are allowed. I found more topologies with this approach: https://stackoverflow.com/a/46707857/1878788]

0
Entering edit mode

That's right. Let me clarify. "Soft" polytomies are allowed but "hard" polytomies are not allowed.

I'm still a little confused on "soft" and "hard polytomies. Are both types present in the "ideal output" of my question? I think I only had "soft" polytomies in my "ideal output."

0
Entering edit mode

It's only useful for numbers and strings, not lists.