Question: Detecting batch effects not via PCA
0
gravatar for chris86
3.6 years ago by
chris86250
United Kingdom, London
chris86250 wrote:

Hi

I am trying to detect batch effect in my microarray samples each belongs to multiple different groups including one of five batches. I am currently melting my expression data frame (log2 normalised values) so that I have a list of sampleIDs in column1, and a list of expression values in column2. I am then adding information such as serology or batch in extra columns. I then use a anova test to discern what magnitude the batch effect is having relative to other variables.

 

aov.ex2 = aov(value~CELL.TYPE+VISIT+SEROLOGY+HYB.BATCH,data=merged)

 

                 Df   Sum Sq Mean Sq F value Pr(>F)    
CELL.TYPE         4     2221   555.3  109.61 <2e-16 ***
VISIT             4      552   138.0   27.25 <2e-16 ***
SEROLOGY          1      347   347.3   68.55 <2e-16 ***
BATCH             4     2123   530.9  104.79 <2e-16 ***
Residuals   5391730 27314376     5.1                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

                 Df   Sum Sq Mean Sq F value Pr(>F)    
CELL.TYPE         4     1318   329.6  65.683 <2e-16 ***
VISIT             4      410   102.6  20.440 <2e-16 ***
SEROLOGY          1      467   466.7  93.004 <2e-16 ***
BATCH             4        3     0.6   0.128  0.972    
Residuals   5391730 27058055     5.0                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

I can see that the p value for batch effect has gone up a lot. However I was a bit confused because the PCA plot did not show any batch effect, yet the anova test is giving me a highly significant value for batch effect. Also the F value for the batch effect is very high, higher than other clinical variables, I would not really expect this. Any comments or thoughts? Am I doing this correctly?

 

Cheers,

Robert

 

 

next-gen latest • 1.2k views
ADD COMMENTlink modified 3.6 years ago by vassialk190 • written 3.6 years ago by chris86250
1

What are the two different analyses? BATCH is not significant in the second one.

Please post some of the data so we see the structure. 

ADD REPLYlink written 3.6 years ago by karl.stamm3.4k
0
gravatar for vassialk
3.6 years ago by
vassialk190
Belarus
vassialk190 wrote:

Try JMP software, Genesis, Expander, MeV and relevant Bioconductor packages, all you need is there, read documentation

ADD COMMENTlink written 3.6 years ago by vassialk190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1611 users visited in the last hour