Question: Time point microarray data correcting for batch effects
0
gravatar for tleona3
19 months ago by
tleona310
tleona310 wrote:

Hi All,

I'm looking at an old microarray gene expression dataset and I had a question about correcting for chip to chip batch effects. The samples were run on the Affymetrix Mouse Genome 430 2.0 Array.

The experimental design is the following: 3 biological replicates of both paired mouse cutaneous skin (c) and oral mucosa (m) taken from 8 different time points (including t0= control) for a total of 48 samples. The issue I'm concerned about is that when looking at the chip hybridization data, all of the samples for both cutaneous and mucosa at the same time points were hybridized to the same chip. I'm worried I cannot correct for batch effects due to this and since I'm looking for differential expression changes over time. Please see below:

Chip 1: t0_c1, t0_c2, t0_c3, t1_c1, t1_c2, t1_c3, t0_m1, t0_m2, t0_m3, t1_m1, t1_m2, t1_m3

Chip 2: t2_c1, t2_c2, t2_c3, t3_c1, t3_c2, t3_c3, t2_m1, t2_m2, t2_m3, t3_m1, t3_m2, t3_m3

Chip 3: t4_c1, t4_c2, t4_c3, t5_c1, t5_c2, t5_c3, t4_m1, t4_m2, t4_m3, t5_m1, t5_m2, t5_m3

Chip 4: t6_c1, t6_c2, t6_c3, t7_c1, t7_c2, t7_c3, t6_m1, t6_m2, t6_m3, t7_m1, t7_m2, t7_m3

When looking at this normalized data on a PCA plot the samples group according to the chip they were run on t0_c with t1_c and t0_m with t1_m, etc.. Is there any way I can correct for this batch effect with the way they were run on the chips?

ADD COMMENTlink modified 19 months ago by mforde841.2k • written 19 months ago by tleona310
2
gravatar for mforde84
19 months ago by
mforde841.2k
mforde841.2k wrote:

If you know the batches you can use ComBat from the sva R package or the removeBatchEffects() function in the limma R package. If you don't know the batches, I think sva has additional options for you.

Also, I've seen people control for batch by including it as a coefficient in a linear model. For example:

expression ~ time + drug_treatment + batch + ... etc etc etc

but note this assumes normally distributed data. Since it's array, the normalization data is most likely already log2 transformed, but just read the documentation and make sure. Or plot a few genes and see if they exhibit roughly normal distributions.

ADD COMMENTlink modified 19 months ago • written 19 months ago by mforde841.2k

Hi mforde84,

Thanks for your response. I'm afraid I might not have been clear in my question. I was planning to use ComBat to correct for chip to chip batch effects, but based on the sample layout across chips (see original post) and my experimental question I'm looking to answer I'm worried I won't be able to correct for the batch effects without altering the analysis.

My experimental question is to see what differential expression changes occur in each tissue across different time points. My variable of interest is time.

The issue I have is that all of the samples for each different time point are on one chip so I don't think I can correct for chip to chip variation due to this (t0, t1 on chip 1, t2, t3 on chip 2, etc.). Does that make more sense?

ADD REPLYlink modified 19 months ago • written 19 months ago by tleona310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 769 users visited in the last hour