Question

Using Rle (Relative Log Expression) Mean Values Of Microarray Data To Adjust For Batch Effects

2

Entering edit mode

11.5 years ago

Luke ▴ 40

I am analyzing microarray data generated using Illumina Human HT 12 chips, and there were multiple batches as the samples were analyzed. The data I have has been through the 'standard' genome studio normalization steps, but has not been adjusted for any batch effects.

In analyses testing an outcome of interested against the expression values it is common to 'adjust' (include as an independent variable) for the batch effects using a factor variable.

I have also seen elsewhere that analysts may adjust for the relative log expression (RLE) mean to account for technical bias. RLE means are more commonly used to assess the batch effects using boxplots - I can see from boxplots in my data the a couple of the batches have significantly higher RLE means, bot not all.

My question is which method most accurately accounts for the technical variability introduced by the batches?

My feeling is that using the RLE mean values is best because, not only is this a linear variable, but it is actually based on the data! The batches may not necessarily have affected the expression, but to include them as covariates anyway must introduce some noise to the model. Whereas including the RLE mean values as a covariate, which are based exclusively on the expression data itself, will only account for the observed technical variation. Is this rationale logical? Have I overlooked anything? Many thanks.

microarray expression analysis statistics r • 6.7k views

ADD COMMENT • link updated 11.5 years ago by brentp 24k • written 11.5 years ago by Luke ▴ 40

score 4 · Accepted Answer · 2012-10-24

This article compared several methods of removing batch effects: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0017238

By their metrics, ComBat ( http://www.bu.edu/jlab/wp-assets/ComBat/Abstract.html ) performed the best.

ComBat is available in the R/Bioconductor package SVA: http://www.bioconductor.org/packages/release/bioc/html/sva.html

The advantage of using these methods over simpler approaches is that (some of them) they can remove batch effects while "protecting" your model of interest.