I have been working with phylogenetic trees. I need to understand if there is some python library/source code which can use multiple sequence alignment files to give a general/random tree (as guide tree in clustalw) so that it may be used for further heuristics. I have data files in different formats (.phy/.aln/.fas). I have been using biopython to read the files but it does not do a general reconstruction as I need. I have also searched through many places but there are tools/softwares which work on a file and an initial tree given, while I need to understand how does a general tree reconstructed.
I have also found the following code but this does not take the node values dynamically, while I need to enter the tree in data dynamically from the file. The code is as follows:
node.py file contains this:
class Node:
def __init__(self, identifier):
self.__identifier = identifier
self.__children = []
@property
def identifier(self):
return self.__identifier
@property
def children(self):
return self.__children
def add_child(self, identifier):
self.__children.append(identifier)
tree.py file contains this: from node import Node
(_ROOT, _DEPTH, _BREADTH) = range(3)
class Tree:
def __init__(self):
self.__nodes = {}
@property
def nodes(self):
return self.__nodes
def add_node(self, identifier, parent=None):
node = Node(identifier)
self[identifier] = node
if parent is not None:
self[parent].add_child(identifier)
return node
def display(self, identifier, depth=_ROOT):
children = self[identifier].children
if depth == _ROOT:
print("{0}".format(identifier))
else:
print("\t"*depth, "{0}".format(identifier))
depth += 1
for child in children:
self.display(child, depth) # recursive call
def traverse(self, identifier, mode=_DEPTH):
# Python generator. Loosly based on an algorithm from
# 'Essential LISP' by John R. Anderson, Albert T. Corbett,
# and Brian J. Reiser, page 239-241
yield identifier
queue = self[identifier].children
while queue:
yield queue[0]
expansion = self[queue[0]].children
if mode == _DEPTH:
queue = expansion + queue[1:] # depth-first
elif mode == _BREADTH:
queue = queue[1:] + expansion # width-first
def __getitem__(self, key):
return self.__nodes[key]
def __setitem__(self, key, item):
self.__nodes[key] = item
app.py contains this:
from tree import Tree
(_ROOT, _DEPTH, _BREADTH) = range(3)
tree = Tree()
tree.add_node("Harry") # root node
tree.add_node("Jane", "Harry")
tree.add_node("Bill", "Harry")
tree.add_node("Joe", "Jane")
tree.add_node("Diane", "Jane")
tree.add_node("George", "Diane")
tree.add_node("Mary", "Diane")
tree.add_node("Jill", "George")
tree.add_node("Carol", "Jill")
tree.add_node("Grace", "Bill")
tree.add_node("Mark", "Jane")
tree.display("Harry")
print("***** DEPTH-FIRST ITERATION *****")
for node in tree.traverse("Harry"):
print(node)
print("***** BREADTH-FIRST ITERATION *****")
for node in tree.traverse("Harry", mode=_BREADTH):
print(node)
I used an example list, and a random function with it. It generates a random combinations from the list everytime. This is how I want a data file to be used for tree where a random sequence is generated everytime and this data is dynamically used to reconstruct the tree.
from random import shuffle
new = []
def shuffle_number():
demo_list = ['A','B','C','D','E','F','G','H','I']
shuffle (demo_list)
return demo_list
i=0
while i < 5:
#print (shuffle_number())
var = shuffle_number()
#print (var)
new.append(var)
i+=1
print (new)
Can I seek for some helpful guidance?
Why would you want to use a random starting tree? That doesn’t make much sense to me.
For what its worth, the
ete3
toolkit can generate many different kinds of random trees. It uses arbitrary taxon labels by default though. Not sure if there is a way to give it specific taxa names, but it would be easy enough to edit the resulting newick files to replace each random ID for your actual sequence IDs.Why would a random starting tree not make sense? Sure, there could be better ones that are reasonable and fast to calculate (e.g., NJ), but this would just mean that the original tree is unlikely. Subsequent proposed changes to the topology (in a Bayesian or ML context) would be accepted via MCMC. Please let me know if there's something I've misinterpreted.
Well, I guess this is right because it is not nonsense to use a random tree initially for the same reason you have described. I have tried many python packages but they do not do the way I seek. As I am a python script learner I would want to have python solution such that I may learn how a sequence file is taken to give an initial tree like structure.
Maybe I’m missing something, but if you’re starting from an MSA as OP stated, that implicitly has some best tree or trees, such as the guide tree used to produce the alignment itself. Why would you then go back to a random tree that bears no resemblance to the data?
As I have described earlier , I am a python script learner and I am learning the way phylogenetic softwares do the evolutionary analysis. This is why I am trying to understand how come a sequence data is utilized to give a tree structure without using any matrix or distance formula.