Gene Expression Analyses and K-Means Clustering
1
0
Entering edit mode
9.6 years ago
Jay • 0

I understand that K-means clustering is used very often for gene expression analysis and usually dissimilarity is measured by euclidean distance but are there any particular applications of in which euclidean distance may not be the most appropriate tool in clustering? Is there any other way dissimilarity in gene expression be measured?

gene • 2.2k views
ADD COMMENT
0
Entering edit mode
9.6 years ago

There are plenty of measures to choose from. See for example the R function dist() for some commonly used ones (the R package proxy has more). The main problem with Euclidean distance is that it quickly tends towards a constant on noisy data as the number of dimensions increases and thus becomes useless. This is known as distance concentration. More on this here. All commonly used distance or similarity measures suffer from it to various degrees. What makes analysis possible despite this is the presence of structures/patterns in the data. In my experience however, the cosine distance (i.e. 1-cos) is more resistant to the concentration phenomenon than others, i.e. in the presence of noise, it may allow you to find meaningful clusters where Euclidean distance would fail.

ADD COMMENT

Login before adding your answer.

Traffic: 3071 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6