Question

Differentially expressed genes in Microarray data by Wilcoxon-test

0

Entering edit mode

9.6 years ago

zaynabmousavian ▴ 10

Hi,

I have two datasets of Microarray from the Affymetrix platform, and I have normalized them using gcrma package.

One dataset has 27 samples and another has 66 samples, and I want to identify the differentially expressed genes between two dataset by running the wilcox.test in R, because I have not any idea about the distribution of gene expressions I can not use the t.test. After running wilcox.test, the p-values for most of the genes are less than 0.001!

How can I identify genes with the significant differential expression between these datasets?

Thanks in advance.

R • 4.5k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by zaynabmousavian ▴ 10

Ram · Answer 1 · 2014-12-11

0

Entering edit mode

9.6 years ago

Maxime Lamontagne ★ 2.3k

Questions:

Did you normalize your datasets separately and then you combine the expression? Since the datasets were not normalized together, their expression values could vary a lot.

Is the platform the same for the two datasets? (species, platform version)

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by Maxime Lamontagne ★ 2.3k

0

Entering edit mode

The platform of both datasets is similar, but I have normalized each dataset separately after reading cell files by the affy package, I have normalized each one by the gcrma package. Is there any problem?

ADD REPLY • link 9.6 years ago by zaynabmousavian ▴ 10

0

Entering edit mode

You should extract RAW data separately (R package Affy or Affymetrix Power Tools), combine your unadjusted data and then normalize with gcrma. You are adjusting for which parameters?

ADD REPLY • link 9.6 years ago by Maxime Lamontagne ★ 2.3k

0

Entering edit mode

I have executed the following commands on each dataset separately (cels including the names of .CEL files related to each dataset):

raw.data=ReadAffy(verbose=TRUE, filenames=cels, cdfname="HGU133A_HS_ENTREZG")

data.gcrma.norm=gcrma(raw.data)

Is it correct to read all cel files of both samples in one ReadAffy command and then normalize them by gcrma?

ADD REPLY • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by zaynabmousavian ▴ 10

0

Entering edit mode

Hi Maxime, I have a similar challenge as above. I am trying to run a DEG analysis for two different datasets. I want to normalize my datasets TOGETHER and then combine the expression. However, the platforms are different for the two datasets; one is a hybrid of GPL96(HG-U133A) and GPL97(HG-U133B) while the second dataset's platform is GPL10558. please how do I combine the two datasets to be able to normalize them accordingly before running the DEG analysis?

ADD REPLY • link 3.9 years ago by chiagozieduru • 0

Ram · Answer 2 · 2014-12-15

Is it correct to read all cel files of both samples in one ReadAffy command and then normalize them by gcrma?

You can try that. If this is not working, do you normalization separately and then adjust for the cohort:

adjusted <- as.data.frame(matrix(NA, nrow = nrow(data), ncol = ncol(data)))
colnames(adjusted) <- colnames(data)
residus[,1] <- data[,1]    #Cohort

library(MASS)

for (i in 2:ncol(data)) {
    res <-    residuals(rlm(as.numeric(as.matrix(data[,i])) ~ as.numeric(as.matrix(data$Cohort)), method="MM", na.action=na.exclude))
    residus[,i] <- res
}

Ram · Answer 3 · 2015-01-08

I have not any idea about the distribution of gene expressions I can not use the t.test

If you know you cannot use T.test then your data are not normally distributed (i.e. parametric tests are not appropriate), that's why you run a non-parametric test like wilcoxon.test.

After you obtain the p-values you should correct for multiple testing = p.value correction. I'd suggest you to try FDR correction (Benjamini Hochberg method). Bonferroni is more stringent and you might end up with fewer genes.

Another approach would be, after using affy to normalize, to use limma() package to analyze the data