Help with multiple batch effects
1
0
Entering edit mode
3.3 years ago
fp89 ▴ 30

Hello, I have an expression matrix of 1208 samples (1095 tumor and 113 normal) downloaded from TCGA. I know there are 3 batch effects: type, plateId and TSS. I've tried to correct for them with Combat but I need a little help with the model.matrix.

batch<-as.data.frame(cbind(samples,plateId,group,TSS),as.is=T)[,-1]

#correct for group
mod.1<- model.matrix(~plateId+TSS, data=batch)
bat.1<- ComBat(dat=dati, batch$group, mod.1, mean.only = TRUE, par.prior=TRUE, prior.plots=FALSE) ## correct for plateId mod.2<- model.matrix(~group+TSS, data=batch) bat.2<- ComBat(dat=bat.1, batch$plateId, mod.2, mean.only = TRUE,par.prior=TRUE, prior.plots=FALSE)

## correct for TSS
mod.3<- model.matrix(~group+plateId, data=batch)
bat.3<- ComBat(dat=bat.2, batch\$TSS, mod.3, mean.only = TRUE,par.prior=TRUE, prior.plots=FALSE)


There is something wrong. The error message says:

Error in ((dat - t(design %*% B.hat))^2) %*% rep(1/n.array, n.array) :
requires numeric/complex matrix/vector arguments


Is there anyone who can help me? I'm a student. Thanks in advance.

combat batch effects sva • 3.1k views
1
Entering edit mode
3.3 years ago

Going by the numbers, looks like the breast cancer TCGA data. I have analysed this data many times and never noticed an effect of type, plateId, or TSS on the expression values. What evidence do you have that suggests they are biasing the counts?

To adjust for batch effects, please avoid the use of ComBat at all costs. You have a couple of options:

Kevin

0
Entering edit mode

0
Entering edit mode

Hey, fair enough. It's just not something that I have seen anyone else doing. If you want to adjust for a batch effect, though, first you should check that the effect exist. It may very well not exist, or exist in complex ways that can only be remedied by improving the study design. Batch effects that affect samples unequally are obviously more difficult to model and adjust.

0
Entering edit mode

even i had this issue for rna seq data so i did with svaseq as there is nearly no change in the data even after removing batch effect so what i understand in rna-seq the effect is not much i guess..

0
Entering edit mode

Hi, I'm a bit confused. How can I detect the presence of batch effects? With PCA ok but how can I interpret the graph? This is my pca . Red tumor and blue normal samples.

0
Entering edit mode

i would suggest go for unsupervised clustering this figure looks very confusing

0
Entering edit mode

When I saw your figure, I said 'Ouch...!' - it does look a bit messy, but it's just due to the labels.

When I look closer, I do not see anything unusual: The 11 (blue) samples are normal tissue, whilst the 01 (red) samples are tumours (assuming your are using 11 and 01 to refer to the TCGA barcodes). So, nothing looks unusual - I see this same distribution for each and every TCGA dataset that I analyse.

A batch effect could be inferred from PCA if there is a large proportion of variation explained on PC1. The proportion of difference could be upward of 90%.

0
Entering edit mode

thank you...These are my clustering for group, plateId and TSS.

0
Entering edit mode

Thanks for sharing and well done! - those are pretty cool dendrograms. Also, apologies if my comment (the 'Ouch...!' part) was interpreted in a negative light. I still don't see any major reason for doing adjustments based on either of these (group, plateid, TSS). The group is different because those are normal tissue samples, so, they are expected to be different. What do you think?