Question: Advice for the following PCA analysis
0
gravatar for Mozart
15 months ago by
Mozart130
Mozart130 wrote:

Hello there, I am running RNA-seq analysis on the following data: I am comparing 4 different conditions (WT-treated, WT-untreated, KO-treated, KO-untreated) and I think the following PCA is affected by a batch effect.

red=KO-untreated
green=KO-treated
blue=WT-untreated
violet=WT-treated

enter image description here First of all, can you confirm that there might be this kind of bias? Secondly, how would you recommend to proceed?

rna-seq • 650 views
ADD COMMENTlink modified 15 months ago • written 15 months ago by Mozart130

Can u also provide the legend?

ADD REPLYlink modified 15 months ago • written 15 months ago by reza.jabal320

And can u please elaborate more on green and violet? They are both KO-treated, what is the difference between them?

ADD REPLYlink written 15 months ago by reza.jabal320

Sorry, I have just edited the legend

ADD REPLYlink written 15 months ago by Mozart130

Yes, there is clear 'bias' as evidenced by the variation explained by PC1. I put the word 'bias' in apostrophes because, by the off chance, there may be a biological explanation for the finding.

Were those samples processed on a different batch?; are they the KO or WT? There is no legend in your plot.

Edit: thanks for editing your post to define the groupings

ADD REPLYlink modified 15 months ago • written 15 months ago by Kevin Blighe43k

very sorry about that. I have just edited the legend.

ADD REPLYlink written 15 months ago by Mozart130

If they are just a different batch, then just include batch as a variable in the design model, assuming that you're running DESeq2. That will most likely mitigate the batch effect.

ADD REPLYlink modified 15 months ago • written 15 months ago by Kevin Blighe43k

Hi Kevin, yep I have done that using sva.

For Kevin( hope he will read it, since I am not able to write another reply for the next 24 hours). So, let's see if I have understood your suggestion correctly. Instead of doing this:

dds <- DESeqDataSetFromTximport(txi.kallisto.tsv, sampleTable, ~batch1+batch2+batch3+condition)

You are suggesting me to type this(?):

 dds <- DESeqDataSetFromTximport(txi.kallisto.tsv, sampleTable, ~batch1+batch2+batch3)

Thanks for your help Kevin. I am afraid I have just one column with all the possible condition KO_CTL, KO_TRE, WT_CTL, WT_TRE. My resultsName(dds) is

[1] "Intercept" "condition_KO_TRE_vs_KO_CTL"  "condition_WT_CTL_vs_KO_TRE" [4] condition_WT_TRE_vs_KO_CTL"

probably, I am doing something wrong.

ADD REPLYlink modified 15 months ago • written 15 months ago by Mozart130
2

That seems to have improved it. Can you nevertheless just include batch as a covariate in the DESeq2 design model. I am almost certain that that will mitigate the effect that you see (if indeed those samples on the right-hand-side of your plot are from a different batch).

ADD REPLYlink modified 15 months ago • written 15 months ago by Kevin Blighe43k
1

Hi, I can see your edited post. Why do you have 3 batch variables? There should be just a single batch variable. Your parameters should be something like this:

Batch   Treatment  Group
batch1  untreated  CTL
batch1  treated    LAP
batch2  treated    CTL
batch2  untreated  CTL
etc.

Then use:

~batch+Treatment+Group

You could also merge Treatment and Group into a single variable with paste(), if you wish.

ADD REPLYlink modified 15 months ago • written 15 months ago by Kevin Blighe43k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 802 users visited in the last hour