I have 9 samples for bulk RNA-seq analysis with DESeq2. I have 2 conditions and 2 batches. Below is the sample table.
My design is based on both batch and condition and the PCA plot shows that C1,C2, and C3 samples are separated. below is the code and the PCA plot.
dds <- DESeqDataSetFromTximport(txi.rsem, sampleTable, ~Batch+condition) dds <- DESeq(dds) vsd <- vst(dds, blind=FALSE) plotPCA(vsd, intgroup=c("condition", "Batch"))+ geom_text(aes(label=name),vjust=2)
I know that it is recommended not to remove the batch effect but I was trying to check and see what would be the results after removing the batch effect. I again see that the control samples C1,C2,C3 are not clustered with the other control C4 from another batch. Below is the code and PCA plot after removing the batch effect.
mat <- assay(vsd) mat2 <- limma::removeBatchEffect(mat, vsd$Batch) assay(vsd) <- mat2 plotPCA(vsd, intgroup=c("condition", "Batch")) + geom_text(aes(label=name),vjust=2)
My question is does it make sense to go further and do the DEG analysis? or my samples are not clustered/separated rationally and the results would not make any sense? I think I cannot remove C4 because that is the only connection between the samples of different batches (if that makes sense!). I was hoping that C1,C2, and C3 would cluster together with C4 and then the comparison between the Control and Treatment of different batches would be accurate.