Hello there, I am running RNA-seq analysis on the following data: I am comparing 4 different conditions (WT-treated, WT-untreated, KO-treated, KO-untreated) and I think the following PCA is affected by a batch effect.

red=KO-untreated
green=KO-treated
blue=WT-untreated
violet=WT-treated


First of all, can you confirm that there might be this kind of bias? Secondly, how would you recommend to proceed?

Can u also provide the legend?

And can u please elaborate more on green and violet? They are both KO-treated, what is the difference between them?

Sorry, I have just edited the legend

Yes, there is clear 'bias' as evidenced by the variation explained by PC1. I put the word 'bias' in apostrophes because, by the off chance, there may be a biological explanation for the finding.

Were those samples processed on a different batch?; are they the KO or WT? There is no legend in your plot.

Edit: thanks for editing your post to define the groupings

very sorry about that. I have just edited the legend.

If they are just a different batch, then just include batch as a variable in the design model, assuming that you're running DESeq2. That will most likely mitigate the batch effect.

Hi Kevin, yep I have done that using sva.

For Kevin( hope he will read it, since I am not able to write another reply for the next 24 hours). So, let's see if I have understood your suggestion correctly. Instead of doing this:

dds <- DESeqDataSetFromTximport(txi.kallisto.tsv, sampleTable, ~batch1+batch2+batch3+condition)


You are suggesting me to type this(?):

 dds <- DESeqDataSetFromTximport(txi.kallisto.tsv, sampleTable, ~batch1+batch2+batch3)


Thanks for your help Kevin. I am afraid I have just one column with all the possible condition KO_CTL, KO_TRE, WT_CTL, WT_TRE. My resultsName(dds) is

[1] "Intercept" "condition_KO_TRE_vs_KO_CTL"  "condition_WT_CTL_vs_KO_TRE" [4] condition_WT_TRE_vs_KO_CTL"


probably, I am doing something wrong.

That seems to have improved it. Can you nevertheless just include batch as a covariate in the DESeq2 design model. I am almost certain that that will mitigate the effect that you see (if indeed those samples on the right-hand-side of your plot are from a different batch).

Hi, I can see your edited post. Why do you have 3 batch variables? There should be just a single batch variable. Your parameters should be something like this:

Batch   Treatment  Group
batch1  untreated  CTL
batch1  treated    LAP
batch2  treated    CTL
batch2  untreated  CTL
etc.


Then use:

~batch+Treatment+Group


You could also merge Treatment and Group into a single variable with paste(), if you wish.