Ciao,

Hope all you are having a good day :)

I was curious to understand how to analyse rna-seq data coming from multiple experiments using Deseq2. So initially I had 3 disease vs 3 controls and this is how they clustered on a PCA plot :

Then I started looking into a second batch that has 5 disease vs 4 controls :

And ultimately when I want to analyse (differential gene expression analysis) all 15 samples at once this is how they cluster on the PCA plot :

The design formula is as follows :

```
design = ~batch + condition
```

The sequencing depth is different in both the batches. Can I incorporate quantile normalization on the data to reduce the variance between the same conditions?

Please let be know how I could go about it. Thanks in advance.

Have a lovely day :)

Thanks for your valuable response ATpoint !

Here's the PCA of the samples after removing the batch effect :

So please correct me if I'm wrong, the visualisation that we see by using removeBatchEffect from limma is the same as mentioning batch in the design formula of Deseq2 as mentioned in the post above, is that correct?

Well, not exactly the same but a sufficient proxy to decide whether including it into the design makes sense, from what I understand. Here that seems to be the case.