Question: Differentially expressed genes in Microarray data by Wilcoxon-test
0
gravatar for zaynabmousavian
4.2 years ago by
zaynabmousavian10 wrote:

Hi,

I have two datasets of Microarray from the Affymetrix platform, and I have normalized them using gcrma package.

One dataset has 27 samples and another has 66 samples, and I want to identify the differentially expressed genes between two dataset by running the wilcox.test in R, because I have not any idea about the distribution of gene expressions I can not use the t.test. After running wilcox.test, the p-values for most of the genes are less than 0.001!

How can I identify genes with the significant differential expression between these datasets?

Thanks in advance.

R • 2.2k views
ADD COMMENTlink modified 4.1 years ago by TriS3.6k • written 4.2 years ago by zaynabmousavian10
0
gravatar for Maxime Lamontagne
4.2 years ago by
Québec
Maxime Lamontagne2.1k wrote:

Questions:

Did you normalize your datasets separately and then you combine the expression? Since the datasets were not normalized together, their expression values could vary a lot.

Is the platform the same for the two datasets? (species, platform version)

 

ADD COMMENTlink written 4.2 years ago by Maxime Lamontagne2.1k

The platform of both datasets is similar, but I have normalized each dataset separately after reading cell files by the affy package, I have normalized each one by the gcrma package. Is there any problem?

ADD REPLYlink written 4.2 years ago by zaynabmousavian10

You should extract RAW data separately (R package Affy or Affymetrix Power Tools), combine your unadjusted data and then normalize with gcrma. You are adjusting for which parameters?

ADD REPLYlink written 4.2 years ago by Maxime Lamontagne2.1k

 

I have executed the following commands on each dataset separately (cels including the names of .CEL files related to each dataset):

raw.data=ReadAffy(verbose=TRUE, filenames=cels, cdfname="HGU133A_HS_ENTREZG") 

data.gcrma.norm=gcrma(raw.data)  

Is it correct to read all cel files of both samples in one ReadAffy command and then normalize them by gcrma? 

 

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by zaynabmousavian10
0
gravatar for Maxime Lamontagne
4.2 years ago by
Québec
Maxime Lamontagne2.1k wrote:

"Is it correct to read all cel files of both samples in one ReadAffy command and then normalize them by gcrma?"

You can try that. If this is not working, do you normalization separately and then adjust for the cohort:

 

adjusted <- as.data.frame(matrix(NA, nrow = nrow(data), ncol = ncol(data)))
colnames(adjusted) <- colnames(data)
residus[,1] <- data[,1]    #Cohort

library(MASS)

for (i in 2:ncol(data)) {
    res <-    residuals(rlm(as.numeric(as.matrix(data[,i])) ~ as.numeric(as.matrix(data$Cohort)), method="MM", na.action=na.exclude))
    residus[,i] <- res
}

 

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by Maxime Lamontagne2.1k
0
gravatar for TriS
4.1 years ago by
TriS3.6k
United States, Buffalo
TriS3.6k wrote:

quote

I have not any idea about the distribution of gene expressions I can not use the t.test

if you know you cannot use T.test then your data are not normally distributed (i.e. parametric tests are not appropriate), that's why you run a non-parametric test like wilcoxon.test.

after you obtain the p-values you should correct for multiple testing = p.value correction. I'd suggest you to try FDR correction (Benjamini Hochberg method). Bonferroni is more stringent and you might end up with fewer genes.

another approach would be, after using affy to normalize, to use limma() package to analyze the data

 

ADD COMMENTlink written 4.1 years ago by TriS3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1019 users visited in the last hour