Question: How To Define Batch And Covariate To Remove Batch Effects With Combat.R
gravatar for fbrundu
5.8 years ago by
European Union
fbrundu280 wrote:

Hi all, newbie here.

I am trying to remove batch effects from a Affymetrix microarray data set (this one) using ComBat.R.

I have loaded CEL files into dchip and executed dchip's "Normalize & Model" and "Export expression value". In this way I obtained an expression value file "dChip_signal.xls", first parameter to input to ComBat.R, that contains both signal and call values.

Second parameter is a sample information file, tab-separated-value file composed in this way:

Array name   Sample name   Batch   Covariate1(treatment) ...
Array1       Sample1       1       Tissue1               ...

Ref for this parameters is here

I have read that batch effects are "Batch effects are technical sources of variation that have been added to the samples during handling" here. I do not understand how to assess and determine batch and covariates for each array of dataset (that as far as I know must be identified by a CEL file). Am I missing anything?

I will provide more informations if needed.


microarray affymetrix • 4.8k views
ADD COMMENTlink modified 5.8 years ago by Neilfws48k • written 5.8 years ago by fbrundu280
gravatar for Neilfws
5.8 years ago by
Sydney, Australia
Neilfws48k wrote:

You need to think about how the arrays were processed. Were they scanned on different days (CEL files should include scan date information)? In different labs by different people? If so, there is a potential for batch effects.

ADD COMMENTlink written 5.8 years ago by Neilfws48k

Thanks for the suggestion.. however I think this data has almost certainly batch effects, as some works I saw using it tell that the processing pipeline includes removing batch effects.. Anyway I will try to figure out how to divide data in batches; a solution could be to divide each array serie in different batches.. maybe each CEL a correspondent batch, but I am still unsure it could be successful..

ADD REPLYlink written 5.8 years ago by fbrundu280

You should not divide them yourself, arbitrarily. There should be a batch factor, which you already know, which come from either running samples on different days (one dimension) or different machine (another dimension), etc. Does it help?

ADD REPLYlink written 5.6 years ago by Farshad0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1606 users visited in the last hour