I am analysing RNA-seq data consisting of 3 different groups of samples, 2 different tumour types and the control normal tissue. The design is not balanced since one of the tumour types comes from a different batch (in house), and the other data from the other tumour type and normal come from TCGA downloaded data. Even knowing that, I would like to remove the batch effect but also to retain biological differences when accounting for this.
factor 1-- > group, with 3 levels (tumour type a, tumour type b, and normal )
factor 2 -- > class, with 2 levels (tumour, non tumour)
factor 3 -- > batch, with 25 levels corresponding to 25 different runs
To do so, I am using ComBat as follows, but I am getting this error.
modcombat <- model.matrix(~as.factor(group) + as.factor(class), data=design_data) combat_data <- ComBat(dat=y_norm, \ batch= design_data$run,\ mod=modcombat,\ par.prior=TRUE,\ prior.plots=FALSE,\ mean.only = TRUE) Using the 'mean only' version of ComBat Found25batches Adjusting for3covariate(s) or covariate level(s) Error in ComBat(dat = y_norm, batch = design_data$run, mod = modcombat, : At least one covariate is confounded with batch! Please remove confounded covariates and rerun ComBat
I would appreciate any suggestions about the best model to not lose biology.