Differential expression- batch effects
0
0
Entering edit mode
4.1 years ago
Ml6237 • 0

Greetings!

I’m doing differential expression analysis using DESeq2 and seeking advice on batch effects please. I have 1 experimental factor with four levels (“condition”: A,B,C,D). From PCA plot, samples separated by condition along PC1 (~ 34% of variance). There was a batch effect (2 tissue sampling dates) but only causing samples to separate vertically up PC2. No separation by batch was observed along PC1. I was therefore thinking to perform DE with batch as a covariate in the model (~batch + condition). Then use the batch-corrected variance stabilised counts via limma’s removeBatcheffect() for downstream stuff such as heatmaps/gene expression boxplots, as documented in the DESeq2 vignette.

mat <- assay(vsd)
mat <- limma::removeBatcheffect(mat, vsd$batch)

However, my problem is that the batches are not evenly distributed amongst groups, and I realise this is not optimal (group-batch assignments below) but it is the data I have been given. Although possibly not completely confounded, condition D not great. I would rather not toss data if possible. So, my question is whether it is valid to perform the DE analysis and generate the batch corrected counts as I’ve described given the unbalanced design? Or as the batch effect is along PC2 not PC1, is it less risky to not batch correct than batch correct with an unbalanced design (I'm thinking probably no?)?

Any advice would be much appreciated, thanks.

condition      batch1        batch2
A                 3                 2
B                 1                 4
C                 1                 4
D                 5                 0
RNA-Seq DESeq2 • 1.9k views
ADD COMMENT
0
Entering edit mode

Supposing you have at least three replicates per condition, you can only compare B vs C since A and D are cofounded with batches 3 and 5 respectively

ADD REPLY
0
Entering edit mode

Wait, I am looking at that table at the bottom of the question... Are those the replicates per batch 1 and batch 2? If that is the case, then perhaps there is no complete confounding.

However, my problem is that the batches are not evenly distributed amongst groups, and I realise this is not optimal (group-batch assignments below) but it is the data I have been given.

You should relay back to them that an unbalanced study like this is not good.

Or as the batch effect is along PC2 not PC1, is it less risky to not batch correct than batch correct with an unbalanced design (I'm thinking probably no?)?

Yeh, but, how much per cent variation is explained by PC2? It implies, nevertheless, that the primary source of variation (PC1) is not batch-related.

ADD REPLY
0
Entering edit mode

Hi Kevin and Andres above, Thanks for your replies. From the table, there are 5 replicates for each condition in total so for condition A: 3 samples are from batch 1 and 2 samples from batch 2. PC2 explained 9% of the variation.

ADD REPLY
0
Entering edit mode

...and you conducted PCA before or after removeBatcheffect()?

ADD REPLY
0
Entering edit mode

PC2= 9% before removeBatcheffect().

ADD REPLY

Login before adding your answer.

Traffic: 1757 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6