Visualizing K Mean Clustering Results
5
0
Entering edit mode
10.8 years ago
RB ▴ 20

Hi All

I clustered my data using Kmean clustering in R and clustered into 300 clusters. Can any one please help me how to plot these results in a scatter plot using R.

Thanks very much.

RT

visualization clustering r • 25k views
0
Entering edit mode

What kind of data is it? How many dimensions?

0
Entering edit mode

Yep, how many label type you have?

0
Entering edit mode

it is expression data...say it as 15 samples and 10,000 genes. I clustered the data first using hierarchical clustering and got 300 clusters. Then I did the kmean clustering, giving no of clusters 300. When I use the plot function, it does not plot anything. I am new to R, Please help.

0
Entering edit mode

So, you want to plot your 10,000 genes each as a point and have them visually clustered together or colored according to which of 300 clusters they belong to? I'm not clear exactly what you want. If you want a scatterplot then you need to define x and y axes. Not clear what those would be given you have 15 samples.

4
Entering edit mode
10.8 years ago

I can suggest you to use the ADE4 package: you just have to do a factor with your K-means result:

library(ade4)
dimA<-runif(15)
dimB<-runif(15)
myData<-data.frame(dimA,dimB)
kres<-kmeans(myData,3)
plot(myData)
kmeansRes<-factor(kres$cluster) s.class(myData,fac=kmeansRes, add.plot=TRUE, col=rainbow(nlevels(kmeansRes)))  If you have rownames (i.e. samples name), I advise to use the s.label() function instead of plot(). ADD COMMENT 4 Entering edit mode 5.9 years ago Ron ★ 1.1k The method is similar to what Obi used,but I used ggplot for plotting the final figure. Assuming the RNA expression data,where the Samples are columns and genes are rows. ## k means clustering library(fpc) library(ggplot2) kclust=kmeans(t(data),centers=3) kclust$cluster <- as.factor(kclust$cluster) d=dist(t(data), method = "euclidean") fit=cmdscale(d,eig=TRUE, k=2) # k is the number of dim  ## ggplot visualization p = ggplot(data.frame(t(data)), aes(fit$points[,1], fit$points[,2], color = factor(kclust$cluster)))
p <- p + theme(axis.title.y = element_text(size = rel(1.5), angle = 90))
p <- p + theme(axis.title.x = element_text(size = rel(1.5), angle = 00))
p= p + theme(axis.text=element_text(size=16,angle=90),axis.title=element_text(size=20,face="bold")) + geom_point(size=4)
p= p + theme(legend.text = element_text(size = 14, colour = "black"))
p= p + theme(legend.title = element_text(size = 18, colour = "black"))
p= p  + theme(legend.key.size = unit(1.5, "cm"))
p

0
Entering edit mode

i used your code its fine but when Im trying to plot im getting this error "Error: Aesthetics must be either length 1 or the same as the data (11): x, y, colour"

can you tell what is the issue?

0
Entering edit mode

you need to check how you have loaded your matrix, @Ron has used t(data) # transpose data, remove this and it should work.

3
Entering edit mode
10.5 years ago

What about a PCA/MDS plot? You could use the distances between genes and then color them according to which k-cluster they belong to. Try this code below. I used flexclust{kcca} instead of standard 'kmeans' function so that I could make sure the same distance metric was being used for both k-mean clustering and the MDS plot. Only thing I'm not sure about it how well it work with 300 clusters. I think no matter what it will be hard to visualize differences between that many clusters on a scatter plot.

library(flexclust)
#Imaginary data with 3 samples and 1000 genes
myData<-data.frame(sample1=runif(1000),sample2=runif(1000),sample3=runif(1000))

#Perform k-means clustering
knum=5 #Set desired number of clusters
kres=kcca(myData,k=knum, family=kccaFamily("kmeans", dist="Euclidian", cent="mean"))
cluster_assignments=kres@cluster

#Calculate distance matrix and then perform MDS/PCA
d=dist(myData, method = "euclidean") # euclidean distances between the rows
fit=cmdscale(d,eig=TRUE, k=2) # k is the number of dim

#plot solution
plot(x=fit$points[,1], y=fit$points[,2], xlab="Coordinate 1", ylab="Coordinate 2", main="MDS", type="n")
colors=rainbow(knum)[kres@cluster]
points(x=fit$points[,1], y=fit$points[,2], cex=.7, col=colors, pch=20)


0
Entering edit mode

I am doing k means clustering and found this method for visualizing k -means.How can I add legend to show the sample names in this plot?My data is expression data.

2
Entering edit mode
10.8 years ago
Raygozak ★ 1.4k

You can also look at this blog and what they call a clustergram to asses the clusters found

http://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/

0
Entering edit mode
10.8 years ago
Vitis ★ 2.5k

ggplot2 package in R has very nice ways to show clusters, by plotting mean/median as lines and sd or quantiles as shades. You probably will find sample codes to do that in the manual or website. http://had.co.nz/ggplot2/