Question

Parallel Phylogenetic Tree Generation

5

Entering edit mode

15.1 years ago

Hanif Khalak ★ 1.3k

I am trying some large-scale viral phylogenies with 1000s of gene DNA sequences, each almost 2Kb in length, using a parallel version of ClustalW coded for an SMP machine.

I don't have access to a large cluster, but on a 16-core machine I'm using I found that most of the processing time is not actually the pairwise alignment - it's in the tree building, where only one CPU is being used.

One of the runs with ~10K sequences failed to complete even after a couple of months - had to reboot, but only because of some power test. Go Linux!

Any suggestions as to alternatives that accelerate the tree generation?

phylogenetics parallel tree clustalw • 5.8k views

ADD COMMENT • link updated 15.1 years ago by Paulo Nuin ★ 3.7k • written 15.1 years ago by Hanif Khalak ★ 1.3k

score 10 · Answer 1 · 2010-05-29

10

Entering edit mode

15.1 years ago

Paulo Nuin ★ 3.7k

First things first. Don't use ClustalW for tree generation, it's an alignment program and the Neighbour Joining algorithm there is not as good as some other available. Second, 1000s of sequences even with NJ approach will take a long time. Just calculate all possibilities of arrangements, so there's no magic bullet here.

You have, AFAIK, two options:

Use RAxML, which is a very nice application and known to be fast, more here

Use MrBayes compiled in MPI mode, which will also take some time.

Of course you can try downloading a NJ parallel package, checking Google a couple came up, but I don't know how fast or reliable they are.

ADD COMMENT • link 14.0 years ago by Paulo Nuin ★ 3.7k

1

Entering edit mode

agree, clustalW is not a good choice for trees

ADD REPLY • link 15.1 years ago by Neilfws 49k

1

Entering edit mode

RAxML has become pretty much the gold standard for ML phylogenetics reconstruction. A reasonable alternative, and much faster, is FastTree. There is also RAxML-Light, a stripped down version of RAxML optimized for extremely large taxonomic sets.

For your alignments there are also much better options out there than Clustal. Muscle is one option. Don't recall offhand if Mafft does nucleotides or not.

ADD REPLY • link 14.0 years ago by DG 7.3k

0

Entering edit mode

RAxML looks interesting - I'll have to give it a go on a small set and see how it fares; will probably give better trees as well. Thanks!

ADD REPLY • link 15.1 years ago by Hanif Khalak ★ 1.3k

Ram · Answer 2 · 2010-05-30

5

Entering edit mode

15.1 years ago

Marcin Cieslik ▴ 520

Do you already have an alignment?

remove redundancy (through fast clustering e.g. uclust)
use a fast algorithm (NJ over ML/MP/Bayes)
use a fast memory efficient implementation:
- http://nimbletwist.com/software/ninja/
- http://www.microbesonline.org/fasttree/

I think <10000 you should be fine.

ADD COMMENT • link updated 6.8 years ago by Ram 45k • written 15.1 years ago by Marcin Cieslik ▴ 520

2

Entering edit mode

NJ and MP are horrible ideas for doing trees today. There are incredibly fast implementations of full ML out there that can do thousands to tens of thousands of taxa. RAxML itself is reasonably fast on large datasets but FastTree and RAxML-Light are both optimized for extremely large bacterial and viral datasets and environmental studies.

Removing redundancy is a good idea but depending on your question and data you might only want to do it at the 100% identity level.

ADD REPLY • link 14.0 years ago by DG 7.3k

0

Entering edit mode

Wow! Both Ninja and fasttree claim at least 10x speedup over other similar NJ and ML methods, respectively. Definitely going to try them out - Thanks!

ADD REPLY • link 15.1 years ago by Hanif Khalak ★ 1.3k

0

Entering edit mode

@bubaker: It would be great if you could report back what you found, both with these and for Paulo's suggestions!

ADD REPLY • link 15.1 years ago by Nicojo ★ 1.1k

score 1 · Answer 3 · 2010-05-30

1

Entering edit mode

15.1 years ago

Elipapa ▴ 90

MAFFT and MUSCLE are fast aligners. For speedy (and accurate) tree building few things are better than FastTree in my opinion.

ADD COMMENT • link 15.1 years ago by Elipapa ▴ 90

score 0 · Answer 4 · 2010-05-30

0

Entering edit mode

15.1 years ago

Biomed 5.0k

If you don't want to change the software approach but looking for faster computing I suggest you look at Amazon cloud services.

ADD COMMENT • link 15.1 years ago by Biomed 5.0k

0

Entering edit mode

I would never suggest that, why would that change anything? He already has access to a 16-core machine, and he will just waste money.

ADD REPLY • link 15.1 years ago by Paulo Nuin ★ 3.7k