Question: fviz_nbclust (kmeans) with method "gap_stat" error: did not converge in 10 iterations
0
gravatar for lessismore
10 months ago by
lessismore610
Mexico
lessismore610 wrote:

Dear all,

im trying to find the optimum number of clusters to fit to a gene expression dataset.

For this, Im using the packages FactoMineR and factoextra and the function fviz_nbclust on my scaled dataframe (simple dataframe with genes in rows and samples in columns).

It scales (z-scoring) by column so im transposing first and then scaling. Then i retranspose and calculate the optimal number of clusters.

The problem is that i get a Warning message " did not converge in 10 iterations ".
The question is, do you know a way to modify the number of iteractions?

This is the code im using

df <- scale (t(mydata))
df <- t(df)
fviz_nbclust(df, kmeans, method = "gap_stat")
fit <- kmeans(df, ?) 
mydata2 <- data.frame(df, fit$cluster)

?: this value is dictated by the clusters prediction

Thanks in advance

clustering kmeans • 1.5k views
ADD COMMENTlink modified 10 months ago by Kevin Blighe41k • written 10 months ago by lessismore610
2
gravatar for Kevin Blighe
10 months ago by
Kevin Blighe41k
Kevin Blighe41k wrote:

Buenos dias amigo,

Yes, you can create a custom kmeans function and then supply that to fviz_nbclust(). In the custom function, you increase the iter.max parameter from the default of 10 to something higher, like (here) 50:

MyKmeansFUN <- function(x,k) list(cluster=kmeans(x, k, iter.max=50))

fviz_nbclust(df, FUNcluster=MyKmeansFUN, method="gap_stat")

Note that I also have parallel processing enabled Gap Statistic functions on my GitHub page:

Please try that.

Kevin

ADD COMMENTlink written 10 months ago by Kevin Blighe41k

Dear Kevin, thanks a lot for your help. Do you have any idea about what this means?

> fviz_nbclust(df, FUNcluster=MyKmeansFUN, method="gap_stat")
    Clustering k = 1,2,..., K.max (= 10): .. done
    Bootstrapping, b = 1,2,..., B (= 100)  [one "." per sample]:
    .................................................. 50 
    .................................................. 100 
    There were 50 or more warnings (use warnings() to see the first 50)

> warnings()
Warning messages:
1: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
2: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
3: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
4: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
.....
50: In ans * length(l) + if1 :
  longer object length is not a multiple of shorter object length
ADD REPLYlink written 10 months ago by lessismore610

The error occurs when you attempt to perform operations on objects of unequal dimensions. For example:

c(1, 2, 3, 4) * c(10, 10, 10)
[1] 10 20 30 40
Warning message:
In c(1, 2, 3, 4) * c(10, 10, 10) :
  longer object length is not a multiple of shorter object length

c(1, 2, 3, 4) + c(10, 10, 10)
[1] 11 12 13 14
Warning message:
In c(1, 2, 3, 4) + c(10, 10, 10) :
  longer object length is not a multiple of shorter object length

What are is the dimension size of your input data? You're using k.max=10 and B=100?

ADD REPLYlink modified 10 months ago • written 10 months ago by Kevin Blighe41k
> dim(df)
[1] 2068   25

Im using the default parameters:

fviz_nbclust(df, FUNcluster = MyKmeansFUN, method = "gap_stat", diss = NULL, k.max = 10, nboot = 100,
  verbose = interactive(), barfill = "steelblue", barcolor = "steelblue",
  linecolor = "steelblue", print.summary = TRUE, ...)
ADD REPLYlink modified 10 months ago • written 10 months ago by lessismore610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 720 users visited in the last hour