Dear All,

my question is related to this post: Error in while (change > conv) { : missing value where TRUE/FALSE needed

I have a heterogeneous RNAseq dataset in TPMs from 66 samples and two sequencing batches (64 from one batch, 2 from the second batch). This dataset contains many disease types (categorical), sample types (categorical) and age (continuous) so I thought I can adjust for these covariates using ComBat. Ultimately, I would like to use this dataset for WGCNA to identify gene signatures predictive of survival. I have few questions. Here is what I have done so far:

First I removed samples with zero variance:

```
matrix_good <- mymatrix[apply(mymatrix, 1, var) != 0, ]
```

Run a PCA and color the batches:

```
res.pca <- prcomp(t(matrix_good), scale = T)
groups <- as.factor(metadata$Sequencing.Batch)
fviz_pca_ind(res.pca, label = "none", col.ind = groups,addEllipses = FALSE)
```

While I understand there is only two samples in the second batch, I think I should adjust for them. Or **do you think I should just simply remove them from the analysis?**

Run a PCA and color the disease types:

```
groups <- as.factor(metadata$disease)
fviz_pca_ind(res.pca, label = "none", col.ind = groups, addEllipses = FALSE)
```

Some disease types cluster together (for example the dark yellow + on the bottom right) and so I need to adjust for this and possibly age.

Now I use the `ComBat`

function to correct for batch effect while adjusting for disease type and age:

```
batch <- metadata$Sequencing.Batch
modcombat <- model.matrix(~ as.factor(disease) + AGE, data= metadata)
combat_tpm = ComBat(dat=matrix_good, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)
```

I have two questions here. **Shouldn't PCA analysis of combat_tpm look more homogenous than the original dataset?**

```
res.pca_combat <- prcomp(t(combat_tpm), scale = T)
groups <- as.factor(metadata$disease)
fviz_pca_ind(res.pca_combat, label = "none",col.ind = groups, addEllipses = FALSE)
```

Maybe I am not familiar with typical corrections with such tools, but It does not look to me that the confounding covariate is adjusted? I was expecting to see more uniform distribution of my samples.

and my second question. **If I exclude AGE from my model, it gives me an error and I can not understand why**.

```
batch <- metadata$Sequencing.Batch
modcombat <- model.matrix(~ as.factor(disease), data= metadata)
combat_tpm = ComBat(dat=matrix_good, batch=batch, mod=modcombat, par.prior=TRUE, prior.plots=FALSE)
Error in while (change > conv) {: missing value where TRUE/FALSE needed
```

based on the post above, I thought this error occurs when variance is very low, but here, my expression data is the same. Interestingly, in line with the post above, it seems to run well (but a lot longer) if, `par.prior=FALSE`

. I get the same error if I quantile normalize the data before running ComBat.

I appreciate if you could help me understand this better.

Thank you