Question: Advice for the following PCA analysis
0
gravatar for Mozart
11 months ago by
Mozart80
Mozart80 wrote:

Hello there, I am running RNA-seq analysis on the following data: I am comparing 4 different conditions (WT-treated, WT-untreated, KO-treated, KO-untreated) and I think the following PCA is affected by a batch effect.

red=KO-untreated
green=KO-treated
blue=WT-untreated
violet=WT-treated

enter image description here First of all, can you confirm that there might be this kind of bias? Secondly, how would you recommend to proceed?

rna-seq • 517 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by Mozart80

Can u also provide the legend?

ADD REPLYlink modified 11 months ago • written 11 months ago by reza.jabal300

And can u please elaborate more on green and violet? They are both KO-treated, what is the difference between them?

ADD REPLYlink written 11 months ago by reza.jabal300

Sorry, I have just edited the legend

ADD REPLYlink written 11 months ago by Mozart80

Yes, there is clear 'bias' as evidenced by the variation explained by PC1. I put the word 'bias' in apostrophes because, by the off chance, there may be a biological explanation for the finding.

Were those samples processed on a different batch?; are they the KO or WT? There is no legend in your plot.

Edit: thanks for editing your post to define the groupings

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe37k

very sorry about that. I have just edited the legend.

ADD REPLYlink written 11 months ago by Mozart80

If they are just a different batch, then just include batch as a variable in the design model, assuming that you're running DESeq2. That will most likely mitigate the batch effect.

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe37k

Hi Kevin, yep I have done that using sva.

For Kevin( hope he will read it, since I am not able to write another reply for the next 24 hours). So, let's see if I have understood your suggestion correctly. Instead of doing this:

dds <- DESeqDataSetFromTximport(txi.kallisto.tsv, sampleTable, ~batch1+batch2+batch3+condition)

You are suggesting me to type this(?):

 dds <- DESeqDataSetFromTximport(txi.kallisto.tsv, sampleTable, ~batch1+batch2+batch3)

Thanks for your help Kevin. I am afraid I have just one column with all the possible condition KO_CTL, KO_TRE, WT_CTL, WT_TRE. My resultsName(dds) is

[1] "Intercept" "condition_KO_TRE_vs_KO_CTL"  "condition_WT_CTL_vs_KO_TRE" [4] condition_WT_TRE_vs_KO_CTL"

probably, I am doing something wrong.

ADD REPLYlink modified 11 months ago • written 11 months ago by Mozart80
2

That seems to have improved it. Can you nevertheless just include batch as a covariate in the DESeq2 design model. I am almost certain that that will mitigate the effect that you see (if indeed those samples on the right-hand-side of your plot are from a different batch).

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe37k
1

Hi, I can see your edited post. Why do you have 3 batch variables? There should be just a single batch variable. Your parameters should be something like this:

Batch   Treatment  Group
batch1  untreated  CTL
batch1  treated    LAP
batch2  treated    CTL
batch2  untreated  CTL
etc.

Then use:

~batch+Treatment+Group

You could also merge Treatment and Group into a single variable with paste(), if you wish.

ADD REPLYlink modified 11 months ago • written 11 months ago by Kevin Blighe37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 730 users visited in the last hour