Question

Microarray Preprocessing and filtering

0

Entering edit mode

8.0 years ago

asiyeulas_196505 • 0

Hi All,

I want to use clustering analysis for DNA Microarray Dataset.

And use Golub Leukemia dataset (38 ALL-B, 9 ALL-T and 25 AML ,72 sample*7129 genes ). It is necessary to preprocessing dataset. And I found a paper which is used to Golub Leukemia dataset.And they describe to preprocessing steps in below;

First, a floor of 100 and a ceiling of 16000 was set

second, the data were filtered to include only genes with max/min > 5 and (max − min) > 500

third, the data were transformed to base 10 logarithms

therefore, we exclude the low variance genes from the clustering process. In this paper, the 200 most variable genes were used to analyze the leukemia.

I made first 3 steps and obtained 72 sample * 3571 gene matrix. After that I want to apply latest step ,Maybe It is very easy, but I don't know how to select 200 most variable genes from

preprocessed 72 *3571 leukemia dataset.

What means of 200 most variable genes and how can I select 200 most variable genes from Golub Leukemia dateset ? How can I code this step ?

Can anyone help me.

I am sorry for this very easy question but I need some explanation.

gene microarray • 1.4k views

ADD COMMENT • link updated 8.0 years ago by Shicheng Guo ★ 9.4k • written 8.0 years ago by asiyeulas_196505 • 0

score 1 · Answer 1 · 2016-04-12

1

Entering edit mode

8.0 years ago

Shicheng Guo ★ 9.4k

You can use var(x) or sd(x) and order(x) to select the top 200 genes with highest sd, var. if you use R, the code is as the following:

x<-lapply(data,1,var)
data_top200<-data[order(var(x),decreasing=T)[1:200],]

ADD COMMENT • link 8.0 years ago by Shicheng Guo ★ 9.4k

0

Entering edit mode

thank you, I convert your code to matlab, but I couldn't cluster dataset. I followed all steps in survey but It doesn't work. Does anyone have prreprocessed leukemia dataset ?

ADD REPLY • link 8.0 years ago by asiyeulas_196505 • 0