Question: Correcting for batch effect in RNA-seq data
gravatar for Rimma
18 months ago by
Hebrew University of Jerusalem
Rimma30 wrote:

I used DESeq2 to process RNA-seq data from different sources. And I found harsh batch effect when plotted PCA (different shapes of the figures represent 3 different batches, for example, ctr and PH.7d from different batches cluster apart):

enter image description here

I tried to remove it using limma package as described here:

   > colData
      sample   condition batch
1         100       PH.7d     1
7          75         ctr     1
8  SRR5035380 hblast.10.5     2
25 SRR5035397 hblast.18.5     2
26 SRR8437299         ctr     3
37 SRR8437324       PH.7d     3

        data2<-plotPCA(vsd, intgroup=c('condition','batch'),returnData=T)
        percentVar<- round(100*attr(data2,'percentVar'))

However, there is no changes when I plot the results:

enter image description here

What am I doing wrong?

Also, I tried to remove batch effect using design in DESeq:

ddsB=DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~batch+condition)

I'm getting this error:

   Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

Can somebody help me to solve it? Thanks in advance!

batch effect rna-seq • 1.5k views
ADD COMMENTlink modified 17 months ago by ATpoint44k • written 18 months ago by Rimma30

Are you sure that vsd$data1 corresponds to the vector encoding the batch variable? Seems to me it should be vsd$batch.

ADD REPLYlink written 18 months ago by Friederike6.7k

It looks like batch 2 doesn't contain any of the groups in batch 1 and 3, therefore it is not possible to correct for that batch. Are you sure there is at least one overlapping group in batch 2, that is also found in batch 1 and 3?

ADD REPLYlink written 17 months ago by Benn8.0k
gravatar for ATpoint
17 months ago by
ATpoint44k wrote:

RNA-seq is strongly confounded by the kit and library preparation method from what I've seen. The confounding effect kight dominate the biological variability. The confounding effect probably dominates any kind of biological differences, see here for example a PCA that I made from five independent data sources, processed identically from the in silico side.

Edit: Check if correct use of batch removal attempts as Benn says below can limit the confounding effect.

enter image description here

ADD COMMENTlink modified 17 months ago • written 17 months ago by ATpoint44k

But you can correct for batch when there are overlapping groups. However, I suspect that OP's batch 2 doesn't have any overlapping group...

ADD REPLYlink written 17 months ago by Benn8.0k

True, but to what extend. Do you have experience on how well this works. I mean "mild" batch effects like different culture conditions in the lab, samples taken on different days or different sequencing protocols might be correctable, but can you really "regress" out the effect of different kits and laboratories?

ADD REPLYlink written 17 months ago by ATpoint44k

I have only experience with removeBatchEffect() from edgeR/limma, they work fine, especially for visualization. Clearly the limma::removeBatchEffect code from OP did not work properly. Like Friederike is already suspecting.

ADD REPLYlink modified 17 months ago • written 17 months ago by Benn8.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1067 users visited in the last hour