Clustering Using Phylip
3
3
Entering edit mode
13.1 years ago
Kevin ▴ 100

Hi, I have 320 protein motifs which I would like to cluster on similarity. I have constructed a distance matrix for these motifs and used it to construct a tree. The tree contains the clusters and I want to extract them. Now my problem is breaking the tree into clusters.

Does anyone know of software which can do this? I could write a newick parser which extracts the clusters and will do so if there are no alternatives. Thanks, Kevin

clustering motif • 3.6k views
ADD COMMENT
3
Entering edit mode
13.1 years ago

If your end goal is to get clusters, I would not go via a tree. Instead, I would use the all-against-all distance matrix that you have already created as input to, for example, Markov Linkage Clustering (MCL) that would directly give you clusters. This is easier and gives better results in my experience, although I should mention that I have not tested it for the exact problem that you work on.

You can find the software for MCL here: http://micans.org/mcl/

ADD COMMENT
0
Entering edit mode

That worked well. Thanks for your help.

ADD REPLY
0
Entering edit mode

Good to hear it worked :-)

ADD REPLY
2
Entering edit mode
13.1 years ago
toshnam ▴ 650

I think you can get help for it from this paper. In this paper, author were calculated from the distance matrix using the Fitch program from the Phylip package, and constructed phylogenetic trees.

ADD COMMENT
2
Entering edit mode
13.1 years ago

Already nice answer by Lars, here is my thoughts on using phylip tool for creating a cluster.

Clustering using Phylip is an classic example of a bioinformatics hack, I have done this myself for a set of motifs before and the result was quite intuitive. You can start with an alignment of your motifs then route it through the phylip workflow.

alignment* -> seqboot -> protdist -> neighbor -> consense

output of distance file can be used as input in any tree visualization tools. Dendrogram with bootstrap will be ideal option to see the significant nodes or clusters. This tree can be used to visualize and analyze clusters using the concept of phylogenetic tere. Here you will have a distinct advantage, Phylip is aware of the protein sequence context and the you are getting the final tree after rigorous bootstrapping, which show the significance of you nodes (or clusters).

ADD COMMENT

Login before adding your answer.

Traffic: 2794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6