Question: gene co-expression on microarray assembled from GDS datasets is influenced by strong log2 negative outliers
gravatar for grokaine
4.6 years ago by
grokaine20 wrote:

I need to compute gene co-expression for a compendium of GEO microarrays. I downloaded a number of GDS datasets corresponding to two GPL platforms and merged them in a gene expression table. After lo2 transforming them I obtained a lot of negative values. Negative values come from small gene expression values (between 0 and 1), however due to the log2 transformation they create outliers. These outliers are influencing any type of co-expression measurements. The GDS datasets are supposed to be both background corrected and normalized, but I performed quantile normalization to re-align the probe distribution among datasets. I still have too many negative values though.

How do you recommend me to proceed?

  1. Download raw .CEL files and perform unitary background correction/normalization? I saw people saying that this improves the overall quality but I am not convinced. Mainly because these operations are mostly performed to eliminate consistent noise due to specific experimental conditions. Second, negative values are already present in the GDS datasets after all the statistical proofing, so what is to guarantee I will not endup in the same situation, especially since I will use many different experiments?
  2. Add 1.0 to all expression values before log2 transforming them. This is my favored solution.
  3. Not using any log2 transformation (why is this used anyway?). However this would make outliers even stronger.
  4. ???
ADD COMMENTlink modified 4.6 years ago by Sean Davis25k • written 4.6 years ago by grokaine20
gravatar for Sean Davis
4.6 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

There is a correlation (in the qualitative sense, not in the quantitative sense) between low expression values on an array and variance. This is irrespective of the presence or absence of negative values, so I wouldn't focus on the negative values. Log transformation is used to bring the expression measures to a more bell-shaped distribution and to make the variance across expression values more similar. 

I would suggest getting the .CEL files and normalizing with rma or frozen RMA. I would not manipulate the output in an ad hoc manner without good evidence to do so; there is over a decade of experience with Affy microarrays that you would potentially be invalidating by doing ad hoc stuff....

Finally, since you are interested in correlations, you can use variance filters on the features to remove features that show little or no variance since these are unlikely to show strong correlations.  This will functionally remove the lowest expressed features as well.


ADD COMMENTlink written 4.6 years ago by Sean Davis25k

Yup I also remove the low variance features, now I realize that the boxplot is not very informative on that aspect, so maybe I should redo it after the low variance feature cleaning..

The GDS datasets are supposed to be already background corrected and normalized, and I am using them so I am not manipulating anything. I made a boxplot of all the samples and It looks obvious that the GDS datasets are well made. I am re-normalizing though to align the GDS datasets better (each GDS block has a slightly different median and dispersion). I do not see a scientific fallacy with my approach (and it is used in multi-platform microarray assemblies). Of course ultimately it all depends on hardline reviewers..

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by grokaine20
gravatar for Manvendra Singh
4.6 years ago by
Manvendra Singh2.1k
Berlin, Germany
Manvendra Singh2.1k wrote:

best is to process from cel files

have affy package, process it its easy and quick, detect signals upto threshold, then transform to log scale,

Negative means that signals are less than one, which would be filtered out when you correct the cel files .

use log2 scale otherwise there would be much variance during comparison

ADD COMMENTlink written 4.6 years ago by Manvendra Singh2.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1287 users visited in the last hour