Program for searching for a specific partition within a set of phylogenetic trees
2
1
Entering edit mode
9.9 years ago
rwn ▴ 590

Hello,

Does anyone know of a program (or a function within another program) that is able to read a set of phylogenetic trees and report back which of them conform to a predefined phylogenetic partition? In other words, something where I can say 'give me all trees that partition taxa A, B and C from taxa X, Y and Z' from a set of input trees.

Phylip's Consense comes close, in that it outputs the number of times each partition is observed in a set of trees, but it doesn't tell you which trees they are. I am also aware of the BioPerl $tree->is_monophyletic(@nodes) function in the Bio::Tree::TreeFunctionsI package, which can probably do the job, but was just wondering if there was anything already out there that I had missed.

Thanks!

phylogenetics • 2.8k views
ADD COMMENT
3
Entering edit mode
9.9 years ago
jhc ★ 3.0k

some weeks ago I sent a similar code example to a friend. It uses ETE to check for the monophyly of a given set of values in unrooted trees. You could easly customize the code to test for the monophyly of your two partitions, check if the are direct sister branches, etc.

A more complete version of the check_monophyly function is also in the current development version of ETE (i.e. to detect polyphyletic or paraphyletic relationships): https://github.com/jhcepas/ete/blob/master/ete_dev/coretype/tree.py#L1841


from ete2 import PhyloTree

def check_monophyly(t, values, attr):
    ''' Returns True if all attributes (attr) in a tree (t) are monophyletic for the

        provided set of values. The most monophyletic partition is also returned. '''
    content = t.get_cached_content()
    alltips = content[t]
    targets = set([n for n in alltips if getattr(n, attr) in values])
    smallest = None
    for n, leaves in content.iteritems():
        if targets.issubset(leaves) and (not smallest or len(leaves) < len(smallest)):
            smallest = leaves
        else:
            # if the node itself is not defining the monophyly, break on through
            # to the other side... and check if the monophyly is there
            other_side = alltips - leaves
            if targets.issubset(other_side) and (not smallest or len(other_side) < len(smallest)):
                smallest = other_side
    return len(smallest) == len(targets), smallest

# Test examples

t = PhyloTree('(aaa1, (aaa3, (aaa4, (bbb1, bbb2))));')
print check_monophyly(t, values=set(['aaa']), attr='species')

t = PhyloTree('(aaa1, (bbb3, (aaa4, (bbb1, bbb2))));')
print check_monophyly(t, values=set(['aaa']), attr='species')

t = PhyloTree('(aaa1, (aaa3, (aaa4, (bbb1, bbb2))));')
print check_monophyly(t, values=set(['bbb']), attr='species')

t = PhyloTree('(aaa1, (aaa3, (aaa4, (bbb1, ccc2))));')
print check_monophyly(t, values=set(['bbb', 'ccc']), attr='species')

t = PhyloTree('(bbb1, (aaa3, (aaa4, (bbb1, bbb2))));')
print check_monophyly(t, values=set(['bbb', 'ccc']), attr='species')

t = PhyloTree('(aaa1, (aaa3, (bbb4, (bbb1, bbb2))));')
print check_monophyly(t, values=set(['bbb4', 'bbb2']), attr='name')

t = PhyloTree('(aaa1, (aaa3, (bbb4, (bbb1, bbb2))));')
print check_monophyly(t, values=set(['bbb1', 'bbb2']), attr='name')
ADD COMMENT
0
Entering edit mode

Thanks for that - I've added a Perl script that probably does something very similar below. I'm interested how ETE detects monophyletic relationships in unrooted trees though? I thought that the definition of monophyly, paraphyly etc depends on knowing where the root of the phylogeny lies. Does it look for subclusters of leaves within unrooted trees that is not interrupted by another 'outgroup' taxa?

ADD REPLY
0
Entering edit mode

exactly. If the target tree is unrooted, it will search for "potential" monophyly, meaning "could these nodes form a monophyletic clade? "

ADD REPLY
1
Entering edit mode
9.9 years ago
rwn ▴ 590

Here's a Perl version that searches through a set of trees (e.g., a directory full of newick trees) for the monophyly of a predetermined set of taxa. Might be useful to someone else!


#!/usr/bin/perl -w

=head1 NAME
parseTrees.pl

=head1 USAGE
parseTrees.pl ingroupStrains.txt /path/to/trees nameOfRootTaxon

=head1 SYNOPSIS
Takes a set of trees, either as a single file or as a directory full of files (like newicks), and reports back whenever it finds a monophyletic clade of predetermined interest. Thus, you say 'show me all trees that have taxa1, taxa2 and taxa3 as a mono clade' and it will find them.

Takes a list of strains, one per line, as first argument, the path to a directory of (newick) tree files as second argument and the name (exactly as it appears in the trees) of the taxon you want to use to root the tree as the third.

=head1 OUTPUTS
A list of which trees have your clade of interest in them.

=head1 AUTHOR
R.W. Nowell 2014

=cut

use strict;
use Getopt::Long;
use Pod::Usage;
use Bio::TreeIO;
use Bio::Tree::TreeFunctionsI;
use Sort::Naturally;

my %options;
my $results = GetOptions (\%options, 
    'help|h'
  ) or pod2usage(-verbose => 3);

pod2usage(-verbose=>3) if ( ($options{'help'}) or (@ARGV == 0) );

## get ingroups strains from file
open (GR, $ARGV[0]) or die "\n\tCannot open '$ARGV[0]': $!\n\n";
my @ingroup;
while (<GR>) {
    chomp;
    push (@ingroup, $_) unless ($_ =~ m/^\#/); ## can blank out strains with # char in strains file
}
close GR;

## cd to dir of trees
chomp (my $path = $ARGV[1]);
chdir($path) or die "\n\tCannot change to '$path': $!\n\n";

my @trees = grep { -f } glob '*'; ## assumes everything in dir is a newick tree file...
print "\n\tNumber of tree files: ".@trees."\n\n";

my $count2 = 1;

foreach my $file (@trees) {
    ## get tree from file
    my $in = new Bio::TreeIO (-file => $file, -format => 'newick', -internal_node_id => 'bootstrap');

    my %node_objs;
    my $root_node = chomp($ARGV[2]); ## define root
    my $count1 = 1;

    while ( my $tree = $in->next_tree() ) {    
        ## get a list of node objects
        for my $node ($tree->get_nodes) {    
            ## set a node name (if not a leaf) - root is called N1, and iterates from there 
            unless ( defined ($node->id) ) {
                $node->id("N".$count1);
                $count1++;
            }
            $node_objs{($node->id)}=($node);
        }

        my @ingroup_node_objs;
        foreach (nsort @ingroup) {
            push (@ingroup_node_objs, $node_objs{$_}); ## list of ingroup node objects
        }

        ## returns true if ingroup_node_objs are monophyletic ANYWHERE in the tree,
        if ($tree->is_monophyletic (-nodes => \@ingroup_node_objs, -outgroup => $node_objs{$root_node}) ) { 

            print "Tree ".$count2.": ingroup monophyly found, bootstrap support is ";
            my $lca = $tree->get_lca( -nodes => \@ingroup_node_objs); ## get BS support for clade
            print $lca->bootstrap().".\n";

        } else {
            print "Tree ".$count2.": not found.\n";
        }
    }
    $count2++;
}
__END__
ADD COMMENT

Login before adding your answer.

Traffic: 2987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6