Question: Looking for a maximum parsimony tree-builder with bootstrapping that can be run with options parsed from command-line.
gravatar for confusedious
5.0 years ago by
confusedious420 wrote:

I am on the hunt for a maximum parsimony tree-building program with bootstrapping for Linux that one can parse options to easily from the command-line. Does anyone have any recommendations?

In brief, the tree-builder is to be run as a part of a shell script based pipeline that produces consensus trees for a given alignment under various conditions and finds the configuration that produces the consensus tree with the highest mean branch support value (thus the bootstrapping). My alignment has ~100 sequences of ~16kb, though these are quite highly conserved (parsimony informative sites are only a few hundred). As the earlier steps of the pipeline produce alignments in .fasta format it would be preferable that the tree-builder can read these.

I have been looking at TNT so far but find its interface rather unwieldy. The fact that it opens an interpreter when the program is called is sub-optimal as I would prefer that I can simply run the program with options from the command-line from the shell script (i.e. user$./program [options] [file]). This being said, if anyone has figured out how to use TNT like this then I would be grateful for your advice.

The other option I have looked at is PAUP* but I would prefer not to purchase this if there is a simpler program freely available.

Thank you for any help you can offer Biostars folks.

ADD COMMENTlink modified 5.0 years ago by Brice Sarver2.9k • written 5.0 years ago by confusedious420

[Meant to reply - see my post below.]


ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Brice Sarver2.9k
gravatar for Brice Sarver
5.0 years ago by
Brice Sarver2.9k
United States
Brice Sarver2.9k wrote:

Zev referred me to this question, which provided me the impetus for registering on Biostars.

As he mentioned, PAUP* is now open source. However, I cannot find where Dave Swofford was posting the change log. The worst part is that I know I've been there before - I recall him saying that Sinauer will have the license for the GUI but the command-line version will be free, which is what you want. I posted a comment to Zev's response because I was still looking for the URL; I'll post it if I can find it (perhaps it's down right now?).

Anyway, you may consider R. Klaus Schliep's phangorn library ( will estimate phylogenies using MP as an optimality criterion. If you need a complete command-line interface for some other stuff in a pipeline, I recommend writing a script and calling it using

Rscript --vanilla your_script.R [arguments]

When called this way, R, like other languages, will accept command-line arguments.

Hope this helps!

[Edit: typos]

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Brice Sarver2.9k

I will look into this library. Thank you very much.

And if PAUP* is freely available I will be very happy indeed!

Edit: I have been checking out 'phangorn' in R. It looks very good and like it just might do what I was hoping for (I also know my way around R which makes life a bit easier).

Edit 2: Phangorn is doing what I want. Starting with a NJ tree and then using MP ratchet to further explore tree space seems adequate at this point. I'm having some trouble with the bootstrapping, but I'll figure that out.

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by confusedious420
gravatar for Zev.Kronenberg
5.0 years ago by
United States
Zev.Kronenberg11k wrote:

A really good resource for finding phylogenitic software:


The word on the street is Paup is now open source!

It is worth mentioning that Mr. Bayes is easy to run from the command line and has a really solid manual.




ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Zev.Kronenberg11k

I'm hoping for MP! For the work I am doing I am trying to remove the assumptions that are made in explicitly applying a substitution model - it's a bit of an experiment so I am trying to remove all complicating factors. Part of what I am testing is whether substitution models used in the likes of ML algorithms or Mr. Bayes handle sites with very rapid nucleotide turnover. The use of MP is intended to be a kind of naive control.

ADD REPLYlink written 5.0 years ago by confusedious420

Should have read your post more carefully.  See edits above.

ADD REPLYlink written 5.0 years ago by Zev.Kronenberg11k

Among site rate heterogeneity is most frequently modeled using a gamma distribution, often described by just the shape parameter alpha in most phylogenetic programs (beta is constrained to 1/alpha). This parameter is free can can be estimated. If some sites are evolving with extreme rates, the goal is that they could be accurately described using such a distribution.

It is worth noting that maximum parsimony has its own set of conditions where it produces inconsistent results. I would recommend either using a likelihood-based approach under an estimated model of evolution or a distance-based approach (such as neighbor joining) with genetic distances corrected under a model.

ADD REPLYlink written 5.0 years ago by Brice Sarver2.9k

Thanks Brice. Without getting into the nitty-gritty of what I'm up to (the wheels are turning on a paper), I am exploring alternative methods for dealing with rate heterogeneity. The use of MP was chosen here for the very reason that it does not attempt to correct for this. I will of course be using other tree methods on the same data too, including NJ, ML and BI with various models of evolution. I am benchmarking my approach against these.

ADD REPLYlink written 5.0 years ago by confusedious420

Do you have a new parsimony-based approach, perhaps? Something that avoids the perils of long-branch attraction?

Edit: I ask, because some would argue that the step matrices should be corrected in order to account for some of these problems, tying it back in with your original problem of extreme heterogeneity (and hence the direct answer to your original question!).

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Brice Sarver2.9k

The problem of misinterpretation of homoplasy as synapomorphy is minimised in the method I am trialling. As such it is hoped that the problem of LBA will be reduced. Also, the taxa I am working with are not all that distantly related (pairwise identity between all samples is better than 95%). You are right to raise this issue - I'll see how the method performs in this regard.

ADD REPLYlink written 5.0 years ago by confusedious420

If PAUP is now open source, I will be very happy. The command-line usage of that program is much more straight-forward as I understand things.

ADD REPLYlink written 5.0 years ago by confusedious420
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 528 users visited in the last hour