Question: ConsensusClusterPlus for small sample
0
gravatar for archie
3 months ago by
archie70
India
archie70 wrote:

Hello everyone

I want to identify subgroups in one of cancer dataset. Before using it, I have few questions :

1) What are minimum sample size required to run ConsensusClusterPlus . I have data of 19 samples . Shall I use it for clustering or I should go for other method (eg PCA).

2) While going through manual, I found it will take input data of expression values (normalised or unnormalised ?? ). I also have z score (from expression data) obtained from another analysis for 19 samples. Can I use z score directly and perform clustering using ConsensusClusterPlus software.

I will appreciate all suggestions.

Thanks

consensusclusterplus • 199 views
ADD COMMENTlink modified 5 weeks ago by chris86250 • written 3 months ago by archie70
0
gravatar for Ahill
3 months ago by
Ahill1.5k
United States
Ahill1.5k wrote:

You can use consensus clustering on 19 samples - there is no intrinsic minimum sample size required. Typically, for this and other clustering methods the results will be very dependent on how you select informative genes - see the ConsensusClusterPlus manual for one gene selection approach of picking the most variable genes by MAD. Data should be normalized. Z scores might be OK, but very dependent on how they were computed (sample-wise, or gene-wise?). If you are using a typical expression readout (like normalized read counts or intensities from RNA-Seq), then using those normalized expression levels (not the Z-scores) for an informative subset of the genes with a Pearson correlation distance measure is probably a good place to start to look for sample groupings.

ADD COMMENTlink written 3 months ago by Ahill1.5k

Hi Ahill

Thanks for your valuable answer.

I will go with MAD approach for variable selection. As z score were predicted sample Wise. First, I will start with normalised intensities and will try to get results. Then will try get subgrouping with z score and later on will see how common results are coming from both approach. I have bit problem in understanding the plots and results. I just run sample data in the clusterconsesusplus and got Following results.

k cluster clusterConsensus

2 1 0.90794831578128

2 2 0.758432628514517

3 1 0.624620046443652

3 2 0.911135863955618

3 3 0.986412256470072

4 1 0.890835574988102

4 2 0.886960582630877

4 3 0.666394932640416

4 4 0.98295225849986

5 1 0.86123474251129

5 2 0.884872156152216

5 3 0.556828374192177

5 4 0.839098318290865

5 5 1

6 1 0.825649752799388

6 2 0.937773728911312

6 3 0.649644539921365

6 4 0.726792776419238

6 5 0.698201730147844

6 6 1

How to decide the k and sample membeship based on the clutserconsensus values. Is there need to fix any threshold and then choose specific k.

Similarly, how to decide the item membership based on this results.

k cluster item itemConsensus

1 2 1 28031 0.5002183

2 2 1 28003 0.4185504

3 2 1 28042 0.4727976

4 2 1 43012 0.5462791

5 2 1 LAL5 0.4682668

6 2 1 08018 0.5090733

7 2 1 57001 0.5897417

8 2 1 22010 0.5834408

9 2 1 01007 0.2090324

10 2 1 01003 0.2036311

Thanks in advance A

ADD REPLYlink modified 3 months ago • written 3 months ago by archie70
0
gravatar for chris86
5 weeks ago by
chris86250
United Kingdom, London
chris86250 wrote:

I wouldn't bother consensus clustering with really small sample sizes like that because your statistical power is poor, IMO better to use standard hierarchical clustering with aheatmap or complexheatmap.

I usually consensus cluster with large samples sizes like at least over 60. I think the resampling samples method the Monti algorithm uses isn't going to behave well until you have a larger number of samples.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by chris86250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1648 users visited in the last hour