Question

Clustering Dna Sequences Using K-Means

2

Entering edit mode

13.4 years ago

Monzoor ▴ 310

I want to cluster DNA sequences using oligo-nucleotide frequency vectors. Are there stand-alone implementations of k-means programs available for the same.

• 8.5k views

ADD COMMENT • link updated 13.4 years ago by Michael 54k • written 13.4 years ago by Monzoor ▴ 310

Ram · Answer 1 · 2010-12-16

3

Entering edit mode

13.4 years ago

Michael 54k

I gave an answer to a similar question here using R code.

If you replace hclust with kmeans then you already got there and R becomes a 'stand-alone solution' with a little script.

Try ?kmeans to see the available options. For very large datasets you can also try the Kmeans implementation in the amap package.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 13.4 years ago by Michael 54k

0

Entering edit mode

Thanks Michael. Will try the same and let you know.

ADD REPLY • link 13.3 years ago by Monzoor ▴ 300

Ram · Answer 2 · 2010-12-16

0

Entering edit mode

13.4 years ago

Prateek ★ 1.0k

Not sure how different its is from k-means, but cd-hit is the one I use for clustering protein seqs - you can also use it for nucleotide seqs. It's an incremental clustering algorithm and is pretty fast.

site - http://www.bioinformatics.org/cd-hit/

user's guide - http://www.bioinformatics.org/cd-hit/cd-hit-user-guide.pdf

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 13.4 years ago by Prateek ★ 1.0k

0

Entering edit mode

I think that is a different kind of 'clustering'

ADD REPLY • link 13.4 years ago by Michael 54k