Question: Automatic Clustering Of Biological Sequences
0
gravatar for Anjan
8.0 years ago by
Anjan800
United States
Anjan800 wrote:

K-means requires one to specify the number of clusters; clustering based on HCL requires the user to visually inspect a tree. Have you successfully used an automatic clustering algorithm that provides an optimal partitioning of clustered data? Thanks, Anjan

sequence clustering • 2.4k views
ADD COMMENTlink written 8.0 years ago by Anjan800
4

looked here? http://biostar.stackexchange.com/questions/1504/self-learning-gene-expression-k-means-clustering-in-r

ADD REPLYlink written 8.0 years ago by Michael Dondrup46k
3
gravatar for Jan Kosinski
8.0 years ago by
Jan Kosinski1.6k
Jan Kosinski1.6k wrote:

I doubt that you can find any fully automatic clustering method that will always "optimally" partition your sequences - it so much depends on 1) the biological question you want to answer with your clustering, 2) the data itself.

For clustering protein sequences I always used http://www.eb.tuebingen.mpg.de/departments/1-protein-evolution/software/clans

It produces 3D graph layout of sequence space based on which I divided the sequences into clusters manually. In this program it's relatively do analyze the structure of sequence space, do all the selections etc., and find out which clusters the outliers belong to.

ADD COMMENTlink written 8.0 years ago by Jan Kosinski1.6k
1

Minor comment - Tancred moved down under few years ago and updated version of CLANS is available its new homepage: http://bioinfoserver.rsbs.anu.edu.au/programs/clans/

ADD REPLYlink written 8.0 years ago by Pawel Szczesny3.2k

Thanks for the link. I was considering a multidimensional scaling approach to cluster a group of aligned sequences in 2/3-D space. In the second step I would run a Directed Evolution (DE) based clustering algorithm that would find the optimal partition. DE has been successfully used in automated clustering of diverse data and also image segmentation.

ADD REPLYlink written 8.0 years ago by Anjan800
2
gravatar for Steve Lianoglou
8.0 years ago by
Steve Lianoglou5.0k
US
Steve Lianoglou5.0k wrote:

One method that I haven't seen mentioned here (or other linked articles) is affinity propagation. There is a video on the page that shows you how it works on a small dataset, and is (as far as I know) the only clustering-type algorithm that was the focus of an entire publication in Science(!).

ADD COMMENTlink written 8.0 years ago by Steve Lianoglou5.0k

I have tried affinity propagation in the scikit-learn package, and it looks pretty good. Basically, one feeds in an affinity matrix (calculated from a phylogenetic distance matrix by taking (1 - distance)), or euclidean coordinates that are derived from an affinity or distance matrix. If your labels are in your data as well, you can easily feed them in and find out who clusters with who. Sample code (minus the data I'm working with) available upon request!

ADD REPLYlink written 6.0 years ago by ericmajinglong100
1
gravatar for hadasa
8.0 years ago by
hadasa1.0k
hadasa1.0k wrote:

Have a look at MCL graph clustering. http://micans.org/mcl/

ADD COMMENTlink written 8.0 years ago by hadasa1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2015 users visited in the last hour