Question: Reconstruction of a generic tree using dynamic data with python
0
gravatar for mdsiddra
2.3 years ago by
mdsiddra30
mdsiddra30 wrote:

I have been working with phylogenetic trees. I need to understand if there is some python library/source code which can use multiple sequence alignment files to give a general/random tree (as guide tree in clustalw) so that it may be used for further heuristics. I have data files in different formats (.phy/.aln/.fas). I have been using biopython to read the files but it does not do a general reconstruction as I need. I have also searched through many places but there are tools/softwares which work on a file and an initial tree given, while I need to understand how does a general tree reconstructed.

I have also found the following code but this does not take the node values dynamically, while I need to enter the tree in data dynamically from the file. The code is as follows:

node.py file contains this:

class Node:
    def __init__(self, identifier):
        self.__identifier = identifier
        self.__children = []

    @property
    def identifier(self):
        return self.__identifier

    @property
    def children(self):
        return self.__children

    def add_child(self, identifier):
        self.__children.append(identifier)

tree.py file contains this: from node import Node

(_ROOT, _DEPTH, _BREADTH) = range(3)


class Tree:

    def __init__(self):
        self.__nodes = {}

    @property
    def nodes(self):
        return self.__nodes

    def add_node(self, identifier, parent=None):
        node = Node(identifier)
        self[identifier] = node

        if parent is not None:
            self[parent].add_child(identifier)

        return node

    def display(self, identifier, depth=_ROOT):
        children = self[identifier].children
        if depth == _ROOT:
            print("{0}".format(identifier))
        else:
            print("\t"*depth, "{0}".format(identifier))

        depth += 1
        for child in children:
            self.display(child, depth)  # recursive call

    def traverse(self, identifier, mode=_DEPTH):
        # Python generator. Loosly based on an algorithm from 
        # 'Essential LISP' by John R. Anderson, Albert T. Corbett, 
        # and Brian J. Reiser, page 239-241
        yield identifier
        queue = self[identifier].children
        while queue:
            yield queue[0]
            expansion = self[queue[0]].children
            if mode == _DEPTH:
                queue = expansion + queue[1:]  # depth-first
            elif mode == _BREADTH:
                queue = queue[1:] + expansion  # width-first

    def __getitem__(self, key):
        return self.__nodes[key]

    def __setitem__(self, key, item):
        self.__nodes[key] = item

app.py contains this:

from tree import Tree

(_ROOT, _DEPTH, _BREADTH) = range(3)

tree = Tree()

tree.add_node("Harry")  # root node
tree.add_node("Jane", "Harry")
tree.add_node("Bill", "Harry")
tree.add_node("Joe", "Jane")
tree.add_node("Diane", "Jane")
tree.add_node("George", "Diane")
tree.add_node("Mary", "Diane")
tree.add_node("Jill", "George")
tree.add_node("Carol", "Jill")
tree.add_node("Grace", "Bill")
tree.add_node("Mark", "Jane")

tree.display("Harry")
print("***** DEPTH-FIRST ITERATION *****")
for node in tree.traverse("Harry"):
    print(node)
print("***** BREADTH-FIRST ITERATION *****")
for node in tree.traverse("Harry", mode=_BREADTH):
    print(node)

I used an example list, and a random function with it. It generates a random combinations from the list everytime. This is how I want a data file to be used for tree where a random sequence is generated everytime and this data is dynamically used to reconstruct the tree.

  from random import shuffle
    new = []
    def shuffle_number():
        demo_list = ['A','B','C','D','E','F','G','H','I']
        shuffle (demo_list)
        return demo_list

    i=0
    while i < 5:
        #print (shuffle_number())
        var = shuffle_number()
        #print (var)
        new.append(var)
        i+=1
    print (new)

Can I seek for some helpful guidance?

phylogenetics python • 1.5k views
ADD COMMENTlink modified 2.3 years ago by Brice Sarver3.6k • written 2.3 years ago by mdsiddra30

Why would you want to use a random starting tree? That doesn’t make much sense to me.

For what its worth, the ete3 toolkit can generate many different kinds of random trees. It uses arbitrary taxon labels by default though. Not sure if there is a way to give it specific taxa names, but it would be easy enough to edit the resulting newick files to replace each random ID for your actual sequence IDs.

ADD REPLYlink written 2.3 years ago by Joe18k

Why would a random starting tree not make sense? Sure, there could be better ones that are reasonable and fast to calculate (e.g., NJ), but this would just mean that the original tree is unlikely. Subsequent proposed changes to the topology (in a Bayesian or ML context) would be accepted via MCMC. Please let me know if there's something I've misinterpreted.

ADD REPLYlink written 2.3 years ago by Brice Sarver3.6k

Well, I guess this is right because it is not nonsense to use a random tree initially for the same reason you have described. I have tried many python packages but they do not do the way I seek. As I am a python script learner I would want to have python solution such that I may learn how a sequence file is taken to give an initial tree like structure.

ADD REPLYlink written 2.3 years ago by mdsiddra30

Maybe I’m missing something, but if you’re starting from an MSA as OP stated, that implicitly has some best tree or trees, such as the guide tree used to produce the alignment itself. Why would you then go back to a random tree that bears no resemblance to the data?

ADD REPLYlink written 2.3 years ago by Joe18k

As I have described earlier , I am a python script learner and I am learning the way phylogenetic softwares do the evolutionary analysis. This is why I am trying to understand how come a sequence data is utilized to give a tree structure without using any matrix or distance formula.

ADD REPLYlink written 2.3 years ago by mdsiddra30
0
gravatar for Brice Sarver
2.3 years ago by
Brice Sarver3.6k
United States
Brice Sarver3.6k wrote:

For a completely Python-based solution, I'd start with DendroPy.

ADD COMMENTlink written 2.3 years ago by Brice Sarver3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1506 users visited in the last hour
_