K-means requires one to specify the number of clusters; clustering based on HCL requires the user to visually inspect a tree. Have you successfully used an automatic clustering algorithm that provides an optimal partitioning of clustered data? Thanks, Anjan
I doubt that you can find any fully automatic clustering method that will always "optimally" partition your sequences - it so much depends on 1) the biological question you want to answer with your clustering, 2) the data itself.
For clustering protein sequences I always used http://www.eb.tuebingen.mpg.de/departments/1-protein-evolution/software/clans
It produces 3D graph layout of sequence space based on which I divided the sequences into clusters manually. In this program it's relatively do analyze the structure of sequence space, do all the selections etc., and find out which clusters the outliers belong to.
One method that I haven't seen mentioned here (or other linked articles) is affinity propagation. There is a video on the page that shows you how it works on a small dataset, and is (as far as I know) the only clustering-type algorithm that was the focus of an entire publication in Science(!).