Question: Parallel Phylogenetic Tree Generation
5
gravatar for Hanif Khalak
8.8 years ago by
Hanif Khalak1.2k
Doha, QA
Hanif Khalak1.2k wrote:

I am trying some large-scale viral phylogenies with 1000s of gene DNA sequences, each almost 2Kb in length, using a parallel version of ClustalW coded for an SMP machine.

I don't have access to a large cluster, but on a 16-core machine I'm using I found that most of the processing time is not actually the pairwise alignment - it's in the tree building, where only one CPU is being used.

One of the runs with ~10K sequences failed to complete even after a couple of months - had to reboot, but only because of some power test. Go Linux!

Any suggestions as to alternatives that accelerate the tree generation?

ADD COMMENTlink written 8.8 years ago by Hanif Khalak1.2k
10
gravatar for Paulo Nuin
8.8 years ago by
Paulo Nuin3.7k
Canada
Paulo Nuin3.7k wrote:

First things first. Don't use ClustalW for tree generation, it's an alignment program and the Neighbour Joining algorithm there is not as good as some other available. Second, 1000s of sequences even with NJ approach will take a long time. Just calculate all possibilities of arrangements, so there's no magic bullet here.

You have, AFAIK, two options:

Use RAxML, which is a very nice application and known to be fast, more here

Use MrBayes compiled in MPI mode, which will also take some time.

Of course you can try downloading a NJ parallel package, checking Google a couple came up, but I don't know how fast or reliable they are.

ADD COMMENTlink modified 7.7 years ago • written 8.8 years ago by Paulo Nuin3.7k
1

agree, clustalW is not a good choice for trees

ADD REPLYlink written 8.8 years ago by Neilfws48k
1

RAxML has become pretty much the gold standard for ML phylogenetics reconstruction. A reasonable alternative, and much faster, is FastTree. There is also RAxML-Light, a stripped down version of RAxML optimized for extremely large taxonomic sets.

For your alignments there are also much better options out there than Clustal. Muscle is one option. Don't recall offhand if Mafft does nucleotides or not.

ADD REPLYlink written 7.7 years ago by Dan Gaston7.1k

RAxML looks interesting - I'll have to give it a go on a small set and see how it fares; will probably give better trees as well. Thanks!

ADD REPLYlink written 8.8 years ago by Hanif Khalak1.2k
5
gravatar for Marcin Cieslik
8.8 years ago by
Marcin Cieslik520 wrote:

Do you already have an alignment?

  1. remove redundancy (through fast clustering e.g. uclust)
  2. use a fast algorithm (NJ over ML/MP/Bayes)
  3. use a fast memory efficient implementation:

I think <10000 you should be fine.

ADD COMMENTlink modified 6 months ago by RamRS20k • written 8.8 years ago by Marcin Cieslik520
2

NJ and MP are horrible ideas for doing trees today. There are incredibly fast implementations of full ML out there that can do thousands to tens of thousands of taxa. RAxML itself is reasonably fast on large datasets but FastTree and RAxML-Light are both optimized for extremely large bacterial and viral datasets and environmental studies.

Removing redundancy is a good idea but depending on your question and data you might only want to do it at the 100% identity level.

ADD REPLYlink written 7.7 years ago by Dan Gaston7.1k

Wow! Both Ninja and fasttree claim at least 10x speedup over other similar NJ and ML methods, respectively. Definitely going to try them out - Thanks!

ADD REPLYlink written 8.8 years ago by Hanif Khalak1.2k

@bubaker: It would be great if you could report back what you found, both with these and for Paulo's suggestions!

ADD REPLYlink written 8.8 years ago by Nicojo1.1k
1
gravatar for Elipapa
8.8 years ago by
Elipapa90
Elipapa90 wrote:

MAFFT and MUSCLE are fast aligners. For speedy (and accurate) tree building few things are better than FastTree in my opinion.

ADD COMMENTlink written 8.8 years ago by Elipapa90
0
gravatar for Biomed
8.8 years ago by
Biomed4.5k
Bethesda, MD, USA
Biomed4.5k wrote:

If you don't want to change the software approach but looking for faster computing I suggest you look at Amazon cloud services.

ADD COMMENTlink written 8.8 years ago by Biomed4.5k

I would never suggest that, why would that change anything? He already has access to a 16-core machine, and he will just waste money.

ADD REPLYlink written 8.8 years ago by Paulo Nuin3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 706 users visited in the last hour