Adjusting for batch effect and covariates with ComBat
Entering edit mode
19 months ago
juara ▴ 40

Dear All,

my question is related to this post: Error in while (change > conv) { : missing value where TRUE/FALSE needed

I have a heterogeneous RNAseq dataset in TPMs from 66 samples and two sequencing batches (64 from one batch, 2 from the second batch). This dataset contains many disease types (categorical), sample types (categorical) and age (continuous) so I thought I can adjust for these covariates using ComBat. Ultimately, I would like to use this dataset for WGCNA to identify gene signatures predictive of survival. I have few questions. Here is what I have done so far:

First I removed samples with zero variance:

matrix_good  <- mymatrix[apply(mymatrix, 1, var) != 0, ]

Run a PCA and color the batches:

res.pca <- prcomp(t(matrix_good), scale = T)

groups <- as.factor(metadata$Sequencing.Batch)
fviz_pca_ind(res.pca, label = "none", col.ind = groups,addEllipses = FALSE)


While I understand there is only two samples in the second batch, I think I should adjust for them. Or do you think I should just simply remove them from the analysis?

Run a PCA and color the disease types:

groups <- as.factor(metadata$disease)
fviz_pca_ind(res.pca, label = "none", col.ind = groups, addEllipses = FALSE)


Some disease types cluster together (for example the dark yellow + on the bottom right) and so I need to adjust for this and possibly age.

Now I use the ComBat function to correct for batch effect while adjusting for disease type and age:

batch <- metadata$Sequencing.Batch
modcombat <- model.matrix(~ as.factor(disease) + AGE, data= metadata)
combat_tpm = ComBat(dat=matrix_good, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)

I have two questions here. Shouldn't PCA analysis of combat_tpm look more homogenous than the original dataset?

res.pca_combat <- prcomp(t(combat_tpm), scale = T)
groups <- as.factor(metadata$disease)
fviz_pca_ind(res.pca_combat, label = "none",col.ind = groups, addEllipses = FALSE)


Maybe I am not familiar with typical corrections with such tools, but It does not look to me that the confounding covariate is adjusted? I was expecting to see more uniform distribution of my samples.

and my second question. If I exclude AGE from my model, it gives me an error and I can not understand why.

batch <- metadata$Sequencing.Batch
modcombat <- model.matrix(~ as.factor(disease), data= metadata)
combat_tpm = ComBat(dat=matrix_good, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)

Error in while (change > conv) {: missing value where TRUE/FALSE needed

based on the post above, I thought this error occurs when variance is very low, but here, my expression data is the same. Interestingly, in line with the post above, it seems to run well (but a lot longer) if, par.prior=FALSE. I get the same error if I quantile normalize the data before running ComBat.

I appreciate if you could help me understand this better.

Thank you

RNA-Seq combat batch effect • 1.2k views

Login before adding your answer.

Traffic: 708 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6