Class Validation Using Hierarchical Cluster (Hc) Of Genes
9.2 years ago
Ezhil La ▴ 10

Hi,

I have selected set of genes (114) from large meta-analysis. Then, I used an independent data set (not used in meta-analysis) to cluster the samples (2 groups) with HC based on genes selected from the meta-analysis. The HC rightly clusters all samples except 3 out 40.

Now I want to randomly select 114 genes 1000 times to show none of these random selection of genes clusters samples as my original one. Is there any R package that can do this task?

Thanks a lot, Ezhil

clustering classification • 2.0k views
9.2 years ago

Can't you just use the sample() function? Something like:

## assuming you have a vector of all the genes called "allGenesName"
n <- 1000
p <- 114
rand114Genes <- matrix(nrow=n, ncol=p)

for (i in 1:n) {
rand114Genes[i,] <- sample(allGenesName, p)
}
## and then you do your previous analysis in a loop (or using "apply()") for all lines in rand114Genes
## at last you compile your classification performances


Just an add-on remark: to me, 114 genes seem a lot for a signature set. If you have not done it yet, I would suggest you to try sorting them by weight in the classification process, and try iteratively from 1 to 114 to see how the classification performance evolves (it may be possible that the ~15 first give you the same result).

At last, have you considered any cross-validation technique in your meta-analysis ?

Thanks Manu. Yes, I can use sample to select random genes and do the process. I thought that the whole process have already implemented in some R machine learning packages.

Maybe there is... Although I'm a "R guy", I use weka when I need to perform a machine learning analysis. http://www.cs.waikato.ac.nz/ml/weka/index.html