Question

Detecting batch effects not via PCA

0

Entering edit mode

8.7 years ago

chris86 ▴ 400

Hi

I am trying to detect batch effect in my microarray samples each belongs to multiple different groups including one of five batches. I am currently melting my expression data frame (log2 normalised values) so that I have a list of sampleIDs in column1, and a list of expression values in column2. I am then adding information such as serology or batch in extra columns. I then use a anova test to discern what magnitude the batch effect is having relative to other variables.

aov.ex2 = aov(value~CELL.TYPE+VISIT+SEROLOGY+HYB.BATCH,data=merged)

                 Df   Sum Sq Mean Sq F value Pr(>F)    
CELL.TYPE         4     2221   555.3  109.61 <2e-16 ***
VISIT             4      552   138.0   27.25 <2e-16 ***
SEROLOGY          1      347   347.3   68.55 <2e-16 ***
BATCH             4     2123   530.9  104.79 <2e-16 ***
Residuals   5391730 27314376     5.1                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                 Df   Sum Sq Mean Sq F value Pr(>F)    
CELL.TYPE         4     1318   329.6  65.683 <2e-16 ***
VISIT             4      410   102.6  20.440 <2e-16 ***
SEROLOGY          1      467   466.7  93.004 <2e-16 ***
BATCH             4        3     0.6   0.128  0.972    
Residuals   5391730 27058055     5.0                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I can see that the p value for batch effect has gone up a lot. However I was a bit confused because the PCA plot did not show any batch effect, yet the anova test is giving me a highly significant value for batch effect. Also the F value for the batch effect is very high, higher than other clinical variables, I would not really expect this. Any comments or thoughts? Am I doing this correctly?

Cheers,
Robert

next-gen • 2.1k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by chris86 ▴ 400

1

Entering edit mode

What are the two different analyses? BATCH is not significant in the second one.

Please post some of the data so we see the structure.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by karl.stamm 4.1k