I have downloaded several GSE from GEO database, all from Homo sapiens and from the same platform and tissue. I have merged all the datasets and performed RMA normalization on the merged set. However, I can still see a lot of unexpected variation between them. I suppose it is due to batch effect. How could I remove it?

If you believe there are 'extraneous' effects / unwanted sources of variation in the dataset that could bias your results, then you could perform Surrogate Variable Analysis, which will essentially identify a set of variabes that 'capture' this unwanted variation. The idea, then, is to include these variables in your design forumla for the purpose of controlling for these.


Thank you very much, that seems to work

