Question: Batch effect problem DEG, DESseq2
1
gravatar for maria2019
6 months ago by
maria201990
maria201990 wrote:

I have 9 samples for bulk RNA-seq analysis with DESeq2. I have 2 conditions and 2 batches. Below is the sample table.

enter image description here

My design is based on both batch and condition and the PCA plot shows that C1,C2, and C3 samples are separated. below is the code and the PCA plot.

dds <- DESeqDataSetFromTximport(txi.rsem, sampleTable, ~Batch+condition)
dds <- DESeq(dds)
vsd <- vst(dds, blind=FALSE)
plotPCA(vsd, intgroup=c("condition", "Batch"))+ geom_text(aes(label=name),vjust=2)

enter image description here

I know that it is recommended not to remove the batch effect but I was trying to check and see what would be the results after removing the batch effect. I again see that the control samples C1,C2,C3 are not clustered with the other control C4 from another batch. Below is the code and PCA plot after removing the batch effect.

mat <- assay(vsd)
mat2 <- limma::removeBatchEffect(mat, vsd$Batch)
assay(vsd) <- mat2
plotPCA(vsd, intgroup=c("condition", "Batch")) + geom_text(aes(label=name),vjust=2)

enter image description here

My question is does it make sense to go further and do the DEG analysis? or my samples are not clustered/separated rationally and the results would not make any sense? I think I cannot remove C4 because that is the only connection between the samples of different batches (if that makes sense!). I was hoping that C1,C2, and C3 would cluster together with C4 and then the comparison between the Control and Treatment of different batches would be accurate.

deg rna-seq deseq2 batch-effect • 393 views
ADD COMMENTlink modified 6 months ago by ATpoint31k • written 6 months ago by maria201990
1

Please search the site for "batch effect" and read posts. Your batches are unevenly distributed, with all treatment samples found in the same batch. If you were to adjust for batch effect, DESeq2 will not be able to differentiate changes owing to treatment and changes owing to batch effect, even with the lone control in B2. You should definitely use batch as a covariate, but results probably will not have a lot of statistical power.

ADD REPLYlink written 6 months ago by RamRS26k

Thank you for your response. I am reading other posts. based on your comment, do you suggest removing the batch effect step or just adjusting results based on batch effect and including batch as a covariate? (I assume the second one?)

ADD REPLYlink written 6 months ago by maria201990
1

"Removing" batch effect would be a lot more damaging that accounting for it as a covariate, especially in this case where your dataset might end up losing most of the significant differences even before DE analysis begins. Covariate assumption is the lesser of two evils here, but the true problem is experiment design.

ADD REPLYlink written 6 months ago by RamRS26k
2

Will DESeq allow batch to be a covariate if it's totally confounded in one condition? The second example seems applicable

http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#model-matrix-not-full-rank

ADD REPLYlink written 6 months ago by swbarnes27.5k

Good catch - DESeq cannot account for the batch effect here. Looks like OP is going to have to operate without accounting for batch effects.

ADD REPLYlink written 6 months ago by RamRS26k
7
gravatar for ATpoint
6 months ago by
ATpoint31k
Germany
ATpoint31k wrote:

Batch correction requires replicates of each group in both batches. Your experiment is fully confounded with all treatments in B2. Sorry to say but this is simply an unfortunate experimental design and I do not think any in silico methods can save you from that.

ADD COMMENTlink written 6 months ago by ATpoint31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1351 users visited in the last hour