Question

Looking for a maximum parsimony tree-builder with bootstrapping that can be run with options parsed from command-line.

1

Entering edit mode

9.6 years ago

confusedious ▴ 470

I am on the hunt for a maximum parsimony tree-building program with bootstrapping for Linux that one can parse options to easily from the command-line. Does anyone have any recommendations?

In brief, the tree-builder is to be run as a part of a shell script based pipeline that produces consensus trees for a given alignment under various conditions and finds the configuration that produces the consensus tree with the highest mean branch support value (thus the bootstrapping). My alignment has ~100 sequences of ~16kb, though these are quite highly conserved (parsimony informative sites are only a few hundred). As the earlier steps of the pipeline produce alignments in .fasta format it would be preferable that the tree-builder can read these.

I have been looking at TNT so far but find its interface rather unwieldy. The fact that it opens an interpreter when the program is called is sub-optimal as I would prefer that I can simply run the program with options from the command-line from the shell script (i.e. user$./program [options] [file]). This being said, if anyone has figured out how to use TNT like this then I would be grateful for your advice.

The other option I have looked at is PAUP* but I would prefer not to purchase this if there is a simpler program freely available.

Thank you for any help you can offer Biostars folks.

phylogenetics maximum parsimony command-line linux • 5.2k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by confusedious ▴ 470

0

Entering edit mode

[Meant to reply - see my post below.]

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Brice Sarver ★ 3.8k

4

Entering edit mode

9.6 years ago

Zev.Kronenberg 12k

A really good resource for finding phylogenetic software: http://evolution.genetics.washington.edu/phylip/software.html

EDIT:

The word on the street is Paup is now open source!

It is worth mentioning that Mr. Bayes is easy to run from the command line and has a really solid manual.

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Zev.Kronenberg 12k

1

Entering edit mode

I'm hoping for MP! For the work I am doing I am trying to remove the assumptions that are made in explicitly applying a substitution model - it's a bit of an experiment so I am trying to remove all complicating factors. Part of what I am testing is whether substitution models used in the likes of ML algorithms or Mr. Bayes handle sites with very rapid nucleotide turnover. The use of MP is intended to be a kind of naive control.

ADD REPLY • link 9.6 years ago by confusedious ▴ 470

1

Entering edit mode

Should have read your post more carefully. See edits above.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Zev.Kronenberg 12k

1

Entering edit mode

Among site rate heterogeneity is most frequently modeled using a gamma distribution, often described by just the shape parameter alpha in most phylogenetic programs (beta is constrained to 1/alpha). This parameter is free can can be estimated. If some sites are evolving with extreme rates, the goal is that they could be accurately described using such a distribution.

It is worth noting that maximum parsimony has its own set of conditions where it produces inconsistent results. I would recommend either using a likelihood-based approach under an estimated model of evolution or a distance-based approach (such as neighbor joining) with genetic distances corrected under a model.

ADD REPLY • link 9.6 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

Thanks Brice. Without getting into the nitty-gritty of what I'm up to (the wheels are turning on a paper), I am exploring alternative methods for dealing with rate heterogeneity. The use of MP was chosen here for the very reason that it does not attempt to correct for this. I will of course be using other tree methods on the same data too, including NJ, ML and BI with various models of evolution. I am benchmarking my approach against these.

ADD REPLY • link 9.6 years ago by confusedious ▴ 470

1

Entering edit mode

Do you have a new parsimony-based approach, perhaps? Something that avoids the perils of long-branch attraction?

Edit: I ask, because some would argue that the step matrices should be corrected in order to account for some of these problems, tying it back in with your original problem of extreme heterogeneity (and hence the direct answer to your original question!).

ADD REPLY • link 9.6 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

The problem of misinterpretation of homoplasy as synapomorphy is minimised in the method I am trialling. As such it is hoped that the problem of LBA will be reduced. Also, the taxa I am working with are not all that distantly related (pairwise identity between all samples is better than 95%). You are right to raise this issue - I'll see how the method performs in this regard.

ADD REPLY • link 9.6 years ago by confusedious ▴ 470

0

Entering edit mode

If PAUP is now open source, I will be very happy. The command-line usage of that program is much more straight-forward as I understand things.

ADD REPLY • link 9.6 years ago by confusedious ▴ 470

score 5 · Accepted Answer · 2014-09-16

Zev referred me to this question, which provided me the impetus for registering on Biostars.

As he mentioned, PAUP* is now open source. However, I cannot find where Dave Swofford was posting the change log. The worst part is that I know I've been there before - I recall him saying that Sinauer will have the license for the GUI but the command-line version will be free, which is what you want. I posted a comment to Zev's response because I was still looking for the URL; I'll post it if I can find it (perhaps it's down right now?).

Anyway, you may consider R. Klaus Schliep's phangorn library (http://cran.r-project.org/web/packages/phangorn/phangorn.pdf) will estimate phylogenies using MP as an optimality criterion. If you need a complete command-line interface for some other stuff in a pipeline, I recommend writing a script and calling it using

Rscript --vanilla your_script.R [arguments]

When called this way, R, like other languages, will accept command-line arguments.

Hope this helps!

[Edit: typos]