I want to use clustering analysis for DNA Microarray Dataset.
And use Golub Leukemia dataset (38 ALL-B, 9 ALL-T and 25 AML ,72 sample*7129 genes ). It is necessary to preprocessing dataset. And I found a paper which is used to Golub Leukemia dataset.And they describe to preprocessing steps in below;
First, a floor of 100 and a ceiling of 16000 was set
second, the data were filtered to include only genes with max/min > 5 and (max − min) > 500
third, the data were transformed to base 10 logarithms
therefore, we exclude the low variance genes from the clustering process. In this paper, the 200 most variable genes were used to analyze the leukemia.
I made first 3 steps and obtained 72 sample * 3571 gene matrix. After that I want to apply latest step ,Maybe It is very easy, but I don't know how to select 200 most variable genes from
preprocessed 72 *3571 leukemia dataset.
What means of 200 most variable genes and how can I select 200 most variable genes from Golub Leukemia dateset ? How can I code this step ?
Can anyone help me.
I am sorry for this very easy question but I need some explanation.