Entering edit mode
2.1 years ago
komal.rathi ★ 4.1k
I am using the R package ConsensusClusterPlus. Here is an example with the ALL data:
library(ConsensusClusterPlus) library(ALL) data(ALL) d = exprs(ALL) res <- ConsensusClusterPlus(d, clusterAlg = "pam", finalLinkage = "average", distance = "spearman", plot = NULL, reps = 1000, maxK = 10, pItem = 0.8, pFeature = 1, seed = 100)
So if I want to get information on the cluster membership for each sample when
k = 5, I would get it by using:
cluster5 <- res[] > head(cluster5$consensusClass, n = 10) 01005 01010 03002 04006 04007 04008 04010 04016 06002 08001 1 2 1 2 1 1 2 1 1 3
My question is: how do I extract the most contributing features (or genes in this case) in each cluster?
Since you are clustering patients/samples using expression values, my best guess would be to separate patients based on cluster membership, e.g. For cluster 1, get a matrix of patients that are only associated with cluster 1 and compare the gene expression between other clusters. You can use something like a Wilcox test. Sort results based on fold-change or P-values.
Hi, It is an answer, but would like to know whether you have found a way to extracting the most contributing features for each cluster? I am also stuck at this point.