Question: Clustering In Perl
3
gravatar for Chad A. Davis
6.0 years ago by
Chad A. Davis140
Chad A. Davis140 wrote:

I'm currently using Algorithm::Cluster, which is based on the C Clustering Library, to cluster sequences and structures in Perl. Algorithm::Cluster provides many clustering facilities, including hierarchical clustering. Given the desired number of clusters, it builds the tree and cuts it. What I need, however, is a library that allows for a threshold. Something like: all of the members of one cluster are <= X distance apart, or: any two members of different clusters are >= X distance apart.

Is this possible in Algorithm::Cluster? Or is there another (Perl) module that would, given a distance matrix and a threshold, determine the appropriate number of clusters and their members?

perl clustering • 3.9k views
ADD COMMENTlink written 6.0 years ago by Chad A. Davis140
5
gravatar for Chad A. Davis
6.0 years ago by
Chad A. Davis140
Chad A. Davis140 wrote:

I've submitted a patch against Algorithm::Cluster to allow:

my $cluster_ids = $tree->cutthresh(3.75);

The patch adds an XS interface (i.e. the code is in C). You can find it on this bug report:

https://rt.cpan.org/Public/Bug/Display.html?id=68482

Those interested in a quick Pure Perl solution can use this example which uses some undocumented XS interfaces:

sub cutthresh {
my ($tree, $thresh) = @_;   
my @nodecluster;
my @leafcluster;
# Binary tree: number of internal nodes is 1 less than # of leafs
# Last node is the root, walking down the tree
my $icluster = 0;
# Root node belongs to cluster 0
$nodecluster[@doms-2] = $icluster++;
for (my $i = @doms-2; $i >= 0; $i--) {        
    my $node = $tree->get($i);
    say sprintf "%3d %3d %.3f", $i,$nodecluster[$i], $node->distance;
    my $left = $node->left;
    # Nodes are numbered -1,-2,... Leafs are numbered 0,1,2,...
    my $leftref = $left < 0 ? \$nodecluster[-$left-1] : \$leafcluster[$left];
    my $assigncluster = $nodecluster[$i];
    # Left is always the same as the parent node's cluster
    $$leftref = $assigncluster;
    say sprintf "\tleft  %3d %3d", $left, $$leftref;
    my $right = $node->right;
    # Put right into a new cluster, when thresh not satisfied
    if ($node->distance > $thresh) { $assigncluster = $icluster++ }
    my $rightref = $right < 0 ? \$nodecluster[-$right-1] : \$leafcluster[$right];
    $$rightref = $assigncluster;
    say sprintf "\tright %3d %3d", $right, $$rightref;
}
return @leafcluster;
}
ADD COMMENTlink written 6.0 years ago by Chad A. Davis140

The pure Perl version of this has now been implemented as http://p3rl.org/Algorithm::Cluster::Thresh for those who are interested.

ADD REPLYlink written 5.9 years ago by Chad A. Davis140
1
gravatar for Alastair Kerr
6.0 years ago by
Alastair Kerr5.2k
The University of Edinburgh, UK
Alastair Kerr5.2k wrote:

Do you have to use Perl? I am a huge fan of Perl but for these sort of tasks I would use R. I have used this website to learn clustering in R. You can still use Perl to connect with R if you have to. I played with RSPerl for a while but in the end it was easier for me just to use R scripts.

ADD COMMENTlink written 6.0 years ago by Alastair Kerr5.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 473 users visited in the last hour