How to determine the number of clusters in heirarchical clustering?
3
1
Entering edit mode
6.1 years ago
John ▴ 270

Hi, I got the following R code from previously published paper, and got the graph from the code. How to interpret the graph to determine the number of clusters?

a <-read.table(file="Single_TPM.txt",header=T)
all <-a

c <- cor(all, method="pearson")

# To determine number of groups
distance_sum <-c()
for (k in 1:11){
    branch=cutree(hr,k=k)
    group_ids <-split(names(branch),branch)
    avg_matrix <-all[,c()]
    all_avg_matrix <-all

    for (group.n in 1:length(group_ids)){
        group.idx <-which(colnames(all) %in% group_ids[[group.n]])
        avg_exp <-rowMeans(all[,group.idx])
        all_avg_matrix[,group.idx] <-matrix(rep(avg_exp,length(group.idx)),ncol=length(group.idx),byrow=F)
    }

    distance_sum <-c(distance_sum,sum((all-all_avg_matrix)^2))
}
plot(1:length(distance_sum),distance_sum,type="l")

If there is any other method which suits well, please let me know, I use TPM vales from RSEM output for clustering!

image link

R RNA-Seq rna-seq • 1.6k views
ADD COMMENT
3
Entering edit mode
6.1 years ago

It looks like you're using the elbow method in order to determine ideal cluster number, in which case I agree with Johannes, in that 3 or 4 is the ideal number due to the inflexion point in the curve.

Other methods that you can use to determine ideal cluster number include:

I employed all of these methods in my recent published work: Vitamin D prenatal programming of childhood metabolomics profiles at age 3 y.

ADD COMMENT
2
Entering edit mode
6.1 years ago
caggtaagtat ★ 1.9k

Hi, I think you have to look for the knee of the curve. In this case I would try 3 or 4 clusters.

ADD COMMENT
2
Entering edit mode
6.1 years ago
arta ▴ 670

In R, there is a nice package called ConsensusCluster which determines the optimal number of clusters for unsupervised algorithms.

ADD COMMENT
1
Entering edit mode

Thanks for reminding me! - ConsensusCluster is yet another method.

ADD REPLY

Login before adding your answer.

Traffic: 2998 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6