Question: Adjusting for batch effect and covariates with ComBat
0
gravatar for juara
4 months ago by
juara30
Canada
juara30 wrote:

Dear All,

my question is related to this post: Error in while (change > conv) { : missing value where TRUE/FALSE needed

I have a heterogeneous RNAseq dataset in TPMs from 66 samples and two sequencing batches (64 from one batch, 2 from the second batch). This dataset contains many disease types (categorical), sample types (categorical) and age (continuous) so I thought I can adjust for these covariates using ComBat. Ultimately, I would like to use this dataset for WGCNA to identify gene signatures predictive of survival. I have few questions. Here is what I have done so far:

First I removed samples with zero variance:

matrix_good  <- mymatrix[apply(mymatrix, 1, var) != 0, ]

Run a PCA and color the batches:

res.pca <- prcomp(t(matrix_good), scale = T)

groups <- as.factor(metadata$Sequencing.Batch)
fviz_pca_ind(res.pca, label = "none", col.ind = groups,addEllipses = FALSE)

download

While I understand there is only two samples in the second batch, I think I should adjust for them. Or do you think I should just simply remove them from the analysis?

Run a PCA and color the disease types:

groups <- as.factor(metadata$disease)
fviz_pca_ind(res.pca, label = "none", col.ind = groups, addEllipses = FALSE)

disas

Some disease types cluster together (for example the dark yellow + on the bottom right) and so I need to adjust for this and possibly age.

Now I use the ComBat function to correct for batch effect while adjusting for disease type and age:

batch <- metadata$Sequencing.Batch
modcombat <- model.matrix(~ as.factor(disease) + AGE, data= metadata)
combat_tpm = ComBat(dat=matrix_good, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)

I have two questions here. Shouldn't PCA analysis of combat_tpm look more homogenous than the original dataset?

res.pca_combat <- prcomp(t(combat_tpm), scale = T)
groups <- as.factor(metadata$disease)
fviz_pca_ind(res.pca_combat, label = "none",col.ind = groups, addEllipses = FALSE)

disas1

Maybe I am not familiar with typical corrections with such tools, but It does not look to me that the confounding covariate is adjusted? I was expecting to see more uniform distribution of my samples.

and my second question. If I exclude AGE from my model, it gives me an error and I can not understand why.

batch <- metadata$Sequencing.Batch
modcombat <- model.matrix(~ as.factor(disease), data= metadata)
combat_tpm = ComBat(dat=matrix_good, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)

Error in while (change > conv) {: missing value where TRUE/FALSE needed

based on the post above, I thought this error occurs when variance is very low, but here, my expression data is the same. Interestingly, in line with the post above, it seems to run well (but a lot longer) if, par.prior=FALSE. I get the same error if I quantile normalize the data before running ComBat.

I appreciate if you could help me understand this better.

Thank you

batch effect rna-seq combat • 340 views
ADD COMMENTlink written 4 months ago by juara30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1933 users visited in the last hour
_