Question

How To Transform Microarray Data To Adjust For Batch Effects

5

Entering edit mode

13.5 years ago

David Quigley 11k

I've downloaded someone else's microarray data (Affymetrix HG-133plus2, normalized with GCRMA) and noticed many unexpected genes were differentially expressed with the patient's sex (about 30 males, 30 females). Although a few genes (e.g. Y-chromosome located EIF1AY) will have obvious sex-linkage in any human sample, such effects are not usually so strong or pervasive in my experience. I checked the headers in the CEL files and noticed a very strong batch effect: files processed in years one and two were overwhelmingly male, while year three were all female. I concluded the effect is due to technical variation, or at least it cannot be distinguished from such bias.

Many tools such as SAM allow you to specify batches. However, I wish to do downstream analysis using my own methods. What is the best approach to transform the data set to reduce the batch effect? I am resigned to losing any ability to detect true sex-specific gene expression. If I were only performing linear modeling I could include the batch as a factor in my model. However, I'd like to (for example) analyze correlation using Spearman's rank correlation, for which I don't know an obvious solution.

A quick literature search turned up Johnson Biostatistics 2007, "Adjusting batch effects in microarray expression data using empirical Bayes methods", which in turn references Benito Bioinformatics 2003, "Adjustment of systematic microarray data biases". Before I dive in any further, anyone expert in this area want to comment on best practices?

modeling data microarray • 5.3k views

ADD COMMENT • link updated 13.3 years ago by User 59 13k • written 13.5 years ago by David Quigley 11k

score 4 · Answer 1 · 2010-10-25

4

Entering edit mode

13.5 years ago

User 59 13k

I have always used ComBat.R (from the Johnson Biostatistics paper you mention) to do batch correction on datasets. It's performed very well on our datasets with marked batch variation. I can't say it's best practice, but I can certainly recommend it.

ADD COMMENT • link 13.5 years ago by User 59 13k