Calculating Bootstrap Support From Newick Trees
3
4
Entering edit mode
13.8 years ago

I'm using PhyML to compute phylogenetic trees. In principle, there's a parallel option (via MPI), but it doesn't work for me. Instead of spending lots of time debugging MPI, I was wondering if I could run the bootstraps independently (with a user-supplied tree) on my cluster. So the real question is: Given a list of trees in Newick format, how do I calculate the bootstrap support for the original tree?

phylogenetics • 7.0k views
ADD COMMENT
5
Entering edit mode
13.8 years ago
Paulo Nuin ★ 3.7k

You can use an strategy similar to what Phylip does. You can generate 1000 random input files with SEQBOOT and use these files in PhyML, not in a real parallel mode, but in parallel as starting different PhyML processes in different nodes/cores at the same time (using something like this.

At the end CONSENSE will calculate a consensus tree and give you the actual bootstrap values.

ADD COMMENT
2
Entering edit mode
13.3 years ago
Rvosa ▴ 580

Bootstrap replicates are independent, so to the first part of your question: yes, you can simply create however many bootstrapped matrices you require (e.g. 1000) and run a tree search on each of those on separate nodes of your cluster. You can use seqboot for this, or other utilities with similar functionality. As a plug for Bio::Phylo you might, for example, do the following:

use Bio::Phylo::IO 'parse';

my ($matrix) = @{ parse(
  -format => 'nexus', # or any of the other supported formats, e.g. 'phylip'
  -file   => 'myfile.nex', # or a string, url or handle
  -as_project => 1,
)->get_matrices };

for ( 1 .. 1000 ) {
  my $bootstrapped = $matrix->bootstrap;
  open my $outfh, '>', "myfile{$_}.nex" or die $!;
  print $outfh "#NEXUS\n", $bootstrapped->to_nexus;
}

...which gives you a thousand bootstrapped versions of 'myfile.nex', with names 'myfile[1..1000].nex'. Then, you run a tree search on each of those, concatenate the resulting trees into a list of newick strings (as per your original query) and do the following:

use Bio::Phylo::IO 'parse';

my $forest = parse(
  -format => 'newick',
  -file   => 'mytrees.dnd',
);

my $consensus = $forest->make_consensus( -branches => 'frequency' );
open my $outfh, '>', 'consensus.dnd' or die $!;
print $outfh $consensus->to_newick;

This gives you a little more flexibility in terms of the file formats you can use beyond phylip and newick (and, consequently, the tree searching programs you can use) but other than that it is equivalent to using seqboot and consense - and, depending on the number of sequences and bootstrap replicates, there might be performance issues with using perl.

ADD COMMENT
3
Entering edit mode
13.8 years ago

Here's one approach (proposed 17 years ago), but I think it doesn't fully capture the "normal" way PhyML operates: Using consense from Phylip to calculate the consensus tree from bootstrap trees generated by independent instances of PhyML, which contains numbers that might be similar to the bootstrap values (but are not projected on the original tree).

(Marked as community wiki for further enlightment.)

ADD COMMENT

Login before adding your answer.

Traffic: 1895 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6