Hello, I have an expression matrix of 1208 samples (1095 tumor and 113 normal) downloaded from TCGA. I know there are 3 batch effects: type, plateId and TSS. I've tried to correct for them with Combat but I need a little help with the model.matrix.

```
batch<-as.data.frame(cbind(samples,plateId,group,TSS),as.is=T)[,-1]
#correct for group
mod.1<- model.matrix(~plateId+TSS, data=batch)
bat.1<- ComBat(dat=dati, batch$group, mod.1, mean.only = TRUE, par.prior=TRUE, prior.plots=FALSE)
## correct for plateId
mod.2<- model.matrix(~group+TSS, data=batch)
bat.2<- ComBat(dat=bat.1, batch$plateId, mod.2, mean.only = TRUE,par.prior=TRUE, prior.plots=FALSE)
## correct for TSS
mod.3<- model.matrix(~group+plateId, data=batch)
bat.3<- ComBat(dat=bat.2, batch$TSS, mod.3, mean.only = TRUE,par.prior=TRUE, prior.plots=FALSE)
```

There is something wrong. The error message says:

```
Error in ((dat - t(design %*% B.hat))^2) %*% rep(1/n.array, n.array) :
requires numeric/complex matrix/vector arguments
```

Is there anyone who can help me? I'm a student. Thanks in advance.

Hi Kevin, thank you. This page mdanderson suggests different batch types.

Hey, fair enough. It's just not something that I have seen anyone else doing. If you want to adjust for a batch effect, though, first you should check that the effect exist. It may very well not exist, or exist in complex ways that can only be remedied by improving the study design. Batch effects that affect samples unequally are obviously more difficult to model and adjust.

even i had this issue for rna seq data so i did with svaseq as there is nearly no change in the data even after removing batch effect so what i understand in rna-seq the effect is not much i guess..

Hi, I'm a bit confused. How can I detect the presence of batch effects? With PCA ok but how can I interpret the graph? This is my pca . Red tumor and blue normal samples.

i would suggest go for unsupervised clustering this figure looks very confusing

When I saw your figure, I said 'Ouch...!' - it does look a bit messy, but it's just due to the labels.

When I look closer, I do not see anything unusual: The 11 (blue) samples are normal tissue, whilst the 01 (red) samples are tumours (assuming your are using 11 and 01 to refer to the TCGA barcodes). So, nothing looks unusual - I see this same distribution for each and every TCGA dataset that I analyse.

A batch effect could be inferred from PCA if there is a large proportion of variation explained on PC1. The proportion of difference could be upward of 90%.

thank you...These are my clustering for group, plateId and TSS.

Thanks for sharing and well done! - those are pretty cool dendrograms. Also, apologies if my comment (the 'Ouch...!' part) was interpreted in a negative light. I still don't see any major reason for doing adjustments based on either of these (group, plateid, TSS). The group is different because those are normal tissue samples, so, they are expected to be different. What do you think?