ConsensusClusterPlus: How to extract most contributing features for each cluster
1
0
Entering edit mode
3.1 years ago
komal.rathi ★ 4.1k

Hi,

I am using the R package ConsensusClusterPlus. Here is an example with the ALL data:

library(ConsensusClusterPlus)
library(ALL)
data(ALL)
d = exprs(ALL)

res <- ConsensusClusterPlus(d,
                     clusterAlg = "pam",
                     finalLinkage = "average",
                     distance = "spearman",
                     plot = NULL,
                     reps = 1000, 
                     maxK = 10, 
                     pItem = 0.8,
                     pFeature = 1,
                     seed = 100)

So if I want to get information on the cluster membership for each sample when k = 5, I would get it by using:

cluster5 <- res[[5]]
> head(cluster5$consensusClass, n = 10)
01005 01010 03002 04006 04007 04008 04010 04016 06002 08001 
    1     2     1     2     1     1     2     1     1     3

My question is: how do I extract the most contributing features (or genes in this case) in each cluster?

R consensusclusterplus • 1.7k views
ADD COMMENT
0
Entering edit mode

Since you are clustering patients/samples using expression values, my best guess would be to separate patients based on cluster membership, e.g. For cluster 1, get a matrix of patients that are only associated with cluster 1 and compare the gene expression between other clusters. You can use something like a Wilcox test. Sort results based on fold-change or P-values.

ADD REPLY
0
Entering edit mode

Hi, It is not an answer, but would like to know whether you have found a way to extracting the most contributing features for each cluster? I am also stuck at this point.

ADD REPLY
0
Entering edit mode
7 months ago
aUser ▴ 30

Hi,

You might have figure it out of how to extract the most contributing features for each cluster, but since, someone else might stumbled upon this, so I am writing few lines.

You can not extract the most contributing features from clustered directly. The reason is that CCPlus uses all the point to create correlation matrix and used that matrix to cluster. In doing so, the individual value of each gene is lost/incorporated into the final value (the correlation/distance between two samples). Thus, it is not possible to extract the most contributing features from CCPlus output directly.

To extract these features, one way is, as pointed out by @halo22, extract samples belonging to each cluster and re-calculate the differential expression. Sort the genes based on logFC or p-value and then select after an arbitrary criteria (e.g. log2FC > +- 1 or p.adjusted-value < 0.01 or both). This has one drawback, that some genes might be duplicated, like gene X is also in cluster 1 and in cluster 2. Then you can add another criteria of higher expression or most significance.

I was trying to figure out another way, e.g. remove one feature (gene) and see whether the cluster is intact. But the number of genes are usually very high, and it is really difficult to check cluster-integrity for that many times.

Another way could be, after clustering samples, cluster the genes now into the same number of groups as that of samples. Now the problem is how one can link the cluster of gene to that of samples. Like how I can say that the sample cluster 2 is because of gene cluster 2. May be someone else can enlighten us here.

ADD COMMENT
1
Entering edit mode

Like how I can say that the sample cluster 2 is because of gene cluster 2. May be someone else can enlighten us here.

Trying to factor an expression matrix into linked sample and gene clusters is called "biclustering" and is a separate task from identifying sample clusters on their own. The reason is that the same set of genes may "contribute" to the identity of multiple clusters. Your comment "This has one drawback, that some genes might be duplicated" means that you are interested specifically in the biclustering setting. You can use biclust or QUBIC to do this; and probably can seed them with marginal (sample) clusters.

ADD REPLY

Login before adding your answer.

Traffic: 2785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6