Question

Time point microarray data correcting for batch effects

0

Entering edit mode

7.9 years ago

tleona3 ▴ 10

Hi All,

I'm looking at an old microarray gene expression dataset and I had a question about correcting for chip to chip batch effects. The samples were run on the Affymetrix Mouse Genome 430 2.0 Array.

The experimental design is the following: 3 biological replicates of both paired mouse cutaneous skin (c) and oral mucosa (m) taken from 8 different time points (including t0= control) for a total of 48 samples. The issue I'm concerned about is that when looking at the chip hybridization data, all of the samples for both cutaneous and mucosa at the same time points were hybridized to the same chip. I'm worried I cannot correct for batch effects due to this and since I'm looking for differential expression changes over time. Please see below:

Chip 1: t0_c1, t0_c2, t0_c3, t1_c1, t1_c2, t1_c3, t0_m1, t0_m2, t0_m3, t1_m1, t1_m2, t1_m3

Chip 2: t2_c1, t2_c2, t2_c3, t3_c1, t3_c2, t3_c3, t2_m1, t2_m2, t2_m3, t3_m1, t3_m2, t3_m3

Chip 3: t4_c1, t4_c2, t4_c3, t5_c1, t5_c2, t5_c3, t4_m1, t4_m2, t4_m3, t5_m1, t5_m2, t5_m3

Chip 4: t6_c1, t6_c2, t6_c3, t7_c1, t7_c2, t7_c3, t6_m1, t6_m2, t6_m3, t7_m1, t7_m2, t7_m3

When looking at this normalized data on a PCA plot the samples group according to the chip they were run on t0_c with t1_c and t0_m with t1_m, etc.. Is there any way I can correct for this batch effect with the way they were run on the chips?

microarray combat batch-effect • 2.1k views

ADD COMMENT • link updated 14 months ago by Ram 45k • written 7.9 years ago by tleona3 ▴ 10

score 2 · Accepted Answer · 2017-08-21

2

Entering edit mode

7.9 years ago

mforde84 ★ 1.4k

If you know the batches you can use ComBat from the sva R package or the removeBatchEffects() function in the limma R package. If you don't know the batches, I think sva has additional options for you.

Also, I've seen people control for batch by including it as a coefficient in a linear model. For example:

expression ~ time + drug_treatment + batch + ... etc etc etc

but note this assumes normally distributed data. Since it's array, the normalization data is most likely already log2 transformed, but just read the documentation and make sure. Or plot a few genes and see if they exhibit roughly normal distributions.

ADD COMMENT • link 7.9 years ago by mforde84 ★ 1.4k

0

Entering edit mode

Hi mforde84,

Thanks for your response. I'm afraid I might not have been clear in my question. I was planning to use ComBat to correct for chip to chip batch effects, but based on the sample layout across chips (see original post) and my experimental question I'm looking to answer I'm worried I won't be able to correct for the batch effects without altering the analysis.

My experimental question is to see what differential expression changes occur in each tissue across different time points. My variable of interest is time.

The issue I have is that all of the samples for each different time point are on one chip so I don't think I can correct for chip to chip variation due to this (t0, t1 on chip 1, t2, t3 on chip 2, etc.). Does that make more sense?

ADD REPLY • link 7.9 years ago by tleona3 ▴ 10