Dear community,
I have a problem with Combat-Seq, which I used to correct batch effects in my RNA-seq data.
I have posted a picture at the bottom, which shows you some PCA plots I generated. I have some miRNA RNA-seq data which shows clear batch separation mostly along PC2, but to some extent along PC1 as well (top left plot).
I used Combat-Seq to try and correct these batch effects, but as you can see from the plots it also introduces new clustering (bottom left). I don't understand why this is happening and would be grateful for some experiences, tips or help
The command I used to do the batch correction is this (here Combat-Seq is wrapped in the BatchQC R package):
se_object <- BatchQC::batch_correct(se = se_object, method = "ComBat-Seq",
assay_to_normalize = "counts", batch = "batch_no",
covar = c("v1_stat", "v1_age", "v1_sex"), output_assay_name = "CombatSeq_corrected")
the covariates refer to case-control status (v1_stat), age and sex. Normalized expression on the right was generated using median of ratios (DESeq2 default)
Any help or explanation would be greatly appreciated.
Could you add "sex" as a covariate to the plot?
Here is the same plot with sex as color. I also added a confounding table. As you can see, the batch variable is highly correlated with the clinical status (case/control). This is due to the distribution of samples across batches with many batches containing only cases or controls and only a few mixed samples.
Could this be the reason for CombatSeq not working properly? And do you have any suggestions on how to deal with this problem?
Thanks for your help!