Question

Batch effect problem DEG, DESseq2

4

Entering edit mode

5.8 years ago

maria2019 ▴ 250

I have 9 samples for bulk RNA-seq analysis with DESeq2. I have 2 conditions and 2 batches. Below is the sample table.

enter image description here

My design is based on both batch and condition and the PCA plot shows that C1,C2, and C3 samples are separated. below is the code and the PCA plot.

dds <- DESeqDataSetFromTximport(txi.rsem, sampleTable, ~Batch+condition)
dds <- DESeq(dds)
vsd <- vst(dds, blind=FALSE)
plotPCA(vsd, intgroup=c("condition", "Batch"))+ geom_text(aes(label=name),vjust=2)

enter image description here

I know that it is recommended not to remove the batch effect but I was trying to check and see what would be the results after removing the batch effect. I again see that the control samples C1,C2,C3 are not clustered with the other control C4 from another batch. Below is the code and PCA plot after removing the batch effect.

mat <- assay(vsd)
mat2 <- limma::removeBatchEffect(mat, vsd$Batch)
assay(vsd) <- mat2
plotPCA(vsd, intgroup=c("condition", "Batch")) + geom_text(aes(label=name),vjust=2)

enter image description here

My question is does it make sense to go further and do the DEG analysis? or my samples are not clustered/separated rationally and the results would not make any sense? I think I cannot remove C4 because that is the only connection between the samples of different batches (if that makes sense!). I was hoping that C1,C2, and C3 would cluster together with C4 and then the comparison between the Control and Treatment of different batches would be accurate.

DESeq2 batch-effect RNA-seq DEG • 3.6k views

ADD COMMENT • link updated 20 months ago by ATpoint 88k • written 5.8 years ago by maria2019 ▴ 250

1

Entering edit mode

Please search the site for "batch effect" and read posts. Your batches are unevenly distributed, with all treatment samples found in the same batch. If you were to adjust for batch effect, DESeq2 will not be able to differentiate changes owing to treatment and changes owing to batch effect, even with the lone control in B2. You should definitely use batch as a covariate, but results probably will not have a lot of statistical power.

ADD REPLY • link 5.8 years ago by Ram 45k

0

Entering edit mode

Thank you for your response. I am reading other posts. based on your comment, do you suggest removing the batch effect step or just adjusting results based on batch effect and including batch as a covariate? (I assume the second one?)

ADD REPLY • link 5.8 years ago by maria2019 ▴ 250

1

Entering edit mode

"Removing" batch effect would be a lot more damaging that accounting for it as a covariate, especially in this case where your dataset might end up losing most of the significant differences even before DE analysis begins. Covariate assumption is the lesser of two evils here, but the true problem is experiment design.

ADD REPLY • link 5.8 years ago by Ram 45k

2

Entering edit mode

Will DESeq allow batch to be a covariate if it's totally confounded in one condition? The second example seems applicable

http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#model-matrix-not-full-rank

ADD REPLY • link 5.8 years ago by swbarnes2 15k

0

Entering edit mode

Good catch - DESeq cannot account for the batch effect here. Looks like OP is going to have to operate without accounting for batch effects.

ADD REPLY • link 5.8 years ago by Ram 45k

score 8 · Accepted Answer · 2019-09-26

8

Entering edit mode

5.8 years ago

ATpoint 88k

Batch correction at best requires replicates of each group in both batches. Your experiment is fully confounded with all treatments in B2. Sorry to say but this is simply an unfortunate experimental design and I do not think any in silico methods can save you from that.

ADD COMMENT • link 20 months ago by ATpoint 88k