**250**wrote:

Hi

I am trying to detect batch effect in my microarray samples each belongs to multiple different groups including one of five batches. I am currently melting my expression data frame (log2 normalised values) so that I have a list of sampleIDs in column1, and a list of expression values in column2. I am then adding information such as serology or batch in extra columns. I then use a anova test to discern what magnitude the batch effect is having relative to other variables.

aov.ex2 = aov(value~CELL.TYPE+VISIT+SEROLOGY+HYB.BATCH,data=merged)

Df Sum Sq Mean Sq F value Pr(>F)

CELL.TYPE 4 2221 555.3 109.61 <2e-16 ***

VISIT 4 552 138.0 27.25 <2e-16 ***

SEROLOGY 1 347 347.3 68.55 <2e-16 ***

BATCH 4 2123 530.9 104.79 <2e-16 ***

Residuals 5391730 27314376 5.1

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Df Sum Sq Mean Sq F value Pr(>F)

CELL.TYPE 4 1318 329.6 65.683 <2e-16 ***

VISIT 4 410 102.6 20.440 <2e-16 ***

SEROLOGY 1 467 466.7 93.004 <2e-16 ***

BATCH 4 3 0.6 0.128 0.972

Residuals 5391730 27058055 5.0

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I can see that the p value for batch effect has gone up a lot. However I was a bit confused because the PCA plot did not show any batch effect, yet the anova test is giving me a highly significant value for batch effect. Also the F value for the batch effect is very high, higher than other clinical variables, I would not really expect this. Any comments or thoughts? Am I doing this correctly?

Cheers,

Robert

What are the two different analyses? BATCH is not significant in the second one.

Please post some of the data so we see the structure.

3.4k