Microarray Preprocessing and filtering
1
0
Entering edit mode
8.0 years ago

Hi All,

I want to use clustering analysis for DNA Microarray Dataset.

And use Golub Leukemia dataset (38 ALL-B, 9 ALL-T and 25 AML ,72 sample*7129 genes ). It is necessary to preprocessing dataset. And I found a paper which is used to Golub Leukemia dataset.And they describe to preprocessing steps in below;

First, a floor of 100 and a ceiling of 16000 was set

second, the data were filtered to include only genes with max/min > 5 and (max − min) > 500

third, the data were transformed to base 10 logarithms

therefore, we exclude the low variance genes from the clustering process. In this paper, the 200 most variable genes were used to analyze the leukemia.

I made first 3 steps and obtained 72 sample * 3571 gene matrix. After that I want to apply latest step ,Maybe It is very easy, but I don't know how to select 200 most variable genes from

preprocessed 72 *3571 leukemia dataset.

What means of 200 most variable genes and how can I select 200 most variable genes from Golub Leukemia dateset ? How can I code this step ?

Can anyone help me.

I am sorry for this very easy question but I need some explanation.

gene microarray • 1.4k views
ADD COMMENT
1
Entering edit mode
8.0 years ago
Shicheng Guo ★ 9.4k

You can use var(x) or sd(x) and order(x) to select the top 200 genes with highest sd, var. if you use R, the code is as the following:

x<-lapply(data,1,var)
data_top200<-data[order(var(x),decreasing=T)[1:200],]
ADD COMMENT
0
Entering edit mode

thank you, I convert your code to matlab, but I couldn't cluster dataset. I followed all steps in survey but It doesn't work. Does anyone have prreprocessed leukemia dataset ?

ADD REPLY

Login before adding your answer.

Traffic: 1544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6