How to access the score of a parsimony tree using biopython
1
0
Entering edit mode
5.7 years ago
mdsiddra ▴ 30

I have been using python (3.6) and biopython (1.72) for reconstructing parsimony tree using an aligned protein sequence file (phylip format), as follows:

aln = AlignIO.read(open('FBXO7_clustal_algn.phy'), 'phylip')
print (aln)
calculator = DistanceCalculator()
dm = calculator.get_distance(aln)
constructor = DistanceTreeConstructor()
njtree = constructor.nj(dm)
starting_tree = njtree

scorer = ParsimonyScorer()

searcher = NNITreeSearcher(scorer)
Phylo.draw_ascii(searcher)

constructor = ParsimonyTreeConstructor(searcher, starting_tree)
pars_tree = constructor.build_tree(aln)
Phylo.draw_ascii(pars_tree)

If I try to print the score of the tree using 'scorer' variable, it does'nt work. Is it possible in any way to get the parsimony score of a tree?

Biopython Python • 2.7k views
ADD COMMENT
1
Entering edit mode

When you refer to scorer are you referring to the object you created here in this line:

scorer = ParsimonyScorer()

Also what does "it doesn't work" refer to. It produces an error? It returns nothing? It gives a different score/value/returns something unexpected?

It looks like there's a method function called get_score that you can access. I think usage would be: scorer.get_score(). Not sure if it's applicable though.

ADD REPLY
3
Entering edit mode
5.7 years ago
Joe 21k

It doesn't print anything because there's nothing to print.

All scorer is is an instance of a default class at this point. You haven't passed any arguments (a tree or matrix) to the ParsimonyScorer for it to have anything to calculate on.

Also, you define constructor twice, which is probably confusing things.

Are you following the documentation online?

http://biopython.org/DIST/docs/api/Bio.Phylo.TreeConstruction.ParsimonyTreeConstructor-class.html

and

https://biopython.org/wiki/Phylo

The result of the suggestion above (for my test tree) is scorer.get_score(pars_tree, aln) = 167, but I have no idea if that's meaningful (see that you have to pass it a tree and alignment in order for there to be data to calculate on though).

==============

All the tutorial code works for me, see below:

>>> from Bio import Phylo
>>> from Bio import AlignIO
>>> from Bio.Phylo.TreeConstruction import *
>>> aln = AlignIO.read('PVC1_homologs.phy', 'phylip')
>>> print aln
SingleLetterAlphabet() alignment with 16 rows and 149 columns
MSTTPEQIAVEYPIPTYRFVVSLGDEQIPFNSVSGLDISHDVIE...QAA PAU_02775
....<abridged>
MSTTVDQIAVQYPIPTYRFVVTVGDEQMSFQSVSGLDISYDTIE...EFH PLT_02568
>>> calculator = DistanceCalculator('identity')
>>> dm = calculator.get_distance(aln)
>>> dm
DistanceMatrix(names=['PAU_02775', 'PLT_01696', ...<abridged>..., 0.03355704697986572, 0]])
>>> print(dm)
PAU_02775   0
PLT_01696   0.0134228187919 0
PAK_02606   0.0134228187919 0.0134228187919 0
...<abridged>...
 PAU_02775  PLT_01696   PAK_02606   ....
>>> constructor = DistanceTreeConstructor(calculator, 'nj')
>>> tree = constructor.build_tree(aln)
>>> scorer = ParsimonyScorer()
>>> searcher = NNITreeSearcher(scorer)
>>> constructor = ParsimonyTreeConstructor(searcher, tree)
>>> pars_tree = constructor.build_tree(aln)
>>> print pars_tree
Tree(rooted=True)
    Clade(branch_length=0)
     ...<abridged>...
          Clade(branch_length=0.000559284116331, name='PAK_02014')
                Clade(branch_length=0.00615212527964, name='PAU_02206')
>>> scorer
<Bio.Phylo.TreeConstruction.ParsimonyScorer object at 0x7f1a308cbbd0>   # See? Nothing to be done with these instances
>>> print scorer
<Bio.Phylo.TreeConstruction.ParsimonyScorer object at 0x7f1a308cbbd0>
>>> searcher
<Bio.Phylo.TreeConstruction.NNITreeSearcher object at 0x7f1a308cbb10>
>>> searcher.
searcher.__class__(         searcher.__doc__            searcher.__hash__(          searcher.__new__(           searcher.__repr__(          searcher.__str__(           searcher._get_neighbors(    searcher.search(
searcher.__delattr__(       searcher.__format__(        searcher.__init__(          searcher.__reduce__(        searcher.__setattr__(       searcher.__subclasshook__(  searcher._nni(
searcher.__dict__           searcher.__getattribute__(  searcher.__module__         searcher.__reduce_ex__(     searcher.__sizeof__(        searcher.__weakref__        searcher.scorer
>>> searcher.scorer
<Bio.Phylo.TreeConstruction.ParsimonyScorer object at 0x7f1a308cbbd0>
>>> scorer.get_score()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: get_score() takes exactly 3 arguments (1 given)     # Needs some data to work on...
>>> scorer.get_score(pars_tree, aln)
167
ADD COMMENT
0
Entering edit mode

Thankyou for this helpful reply. Yes, I have been using this online. I have got score for my tree file also.

  1. But this source code results in one tree only. Can't I get more than one parsimonious tree for the same sequence file??
  2. Also I need to ask 'identity' matrix is the one used for Parsimony tree reconstruction or there are more?
  3. I can't get the length for my tree? Is there any method I can use for it??
ADD REPLY
1
Entering edit mode

In short, I don't really know. I'm no expert on parsimony trees. Someone else will have to weigh in, but here's my intuition:

  1. There will be one most parsimonious tree (this is literally what parsimony means -> 'simplest'). You may be able to find more than one equally parsimonious tree (very dependent on the input data), but the scores would obviously be the same. If you really want to explore parsimony trees, I would suggest not using BioPython for this, and instead use some dedicated phylogenetics software for building and analysing the trees.

  2. There are more model definitions you can use (identity is the default):

    >>> DistanceCalculator().models
    ['identity', 'blastn', 'trans', 'benner6', 'benner22', 'benner74', 'blosum100', 'blosum30', 'blosum35', 'blosum40', 'blosum45', 'blosum50', 'blosum55', 'blosum60', 'blosum62', 'blosum65', 'blosum70', 'blosum75', 'blosum80', 'blosum85', 'blosum90', 'blosum95', 'feng', 'fitch', 'genetic', 'gonnet', 'grant', 'ident', 'johnson', 'levin', 'mclach', 'miyata', 'nwsgappep', 'pam120', 'pam180', 'pam250', 'pam30', 'pam300', 'pam60', 'pam90', 'rao', 'risler', 'structure']
    

    They are divided in to DNA and Protein models (they can be inspected by replacing the above .models syntax with .dna_models or .protein_models . Inspecting the class source code shows that identity is defined separately from those 2 and added in directly to .models

    I have no idea what effect each will have on your tree, nor what identity is even doing specifically, I'm afraid.

  3. What do you mean by length of your tree? Farthest tips? Total number of taxa?

ADD REPLY
0
Entering edit mode

Thankyou for the useful suggestions.

  1. Although the parsimony tree is the best tree we get as a result, but is it possible to get the number of trees that have been generated before it reached the best tree?? Or may be last 5/6 trees can be printed before the best tree was reconstructed?
  2. By the length of a tree I mean the sum of the minimum numbers of substitutions over all sites for the given topology. Using Biopython how may I be able to get it?
  3. Would you recommend any efficient phylogenetics software for building and analysing the trees??
ADD REPLY
1
Entering edit mode
  1. In principle I would assume that’s possible. Whether it is easily done within Biopython is another matter. I don’t personally know (and am not currently at a computer to experiment), but if you could get intimate with the Phylo package source, you might be able to find something you can hook in to.

  2. So you want the sum of all branch lengths? These kinds of calculations are probably easiest with Dendropy or ete3.

  3. Software that springs to mind would be PAUP. Both of the previously mentioned tools can be used for tree analysis, but I don’t know if they have specific methods for parsimony trees.

ADD REPLY
0
Entering edit mode

Thankyou for the help.. !

ADD REPLY

Login before adding your answer.

Traffic: 2698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6