Question: Correcting for batch effect in RNA-seq data
gravatar for Rimma
8 weeks ago by
Hebrew University of Jerusalem
Rimma20 wrote:

I used DESeq2 to process RNA-seq data from different sources. And I found harsh batch effect when plotted PCA (different shapes of the figures represent 3 different batches, for example, ctr and PH.7d from different batches cluster apart):

enter image description here

I tried to remove it using limma package as described here:

   > colData
      sample   condition batch
1         100       PH.7d     1
7          75         ctr     1
8  SRR5035380 hblast.10.5     2
25 SRR5035397 hblast.18.5     2
26 SRR8437299         ctr     3
37 SRR8437324       PH.7d     3

        data2<-plotPCA(vsd, intgroup=c('condition','batch'),returnData=T)
        percentVar<- round(100*attr(data2,'percentVar'))

However, there is no changes when I plot the results:

enter image description here

What am I doing wrong?

Also, I tried to remove batch effect using design in DESeq:

ddsB=DESeqDataSetFromMatrix(countData = countData,colData = colData, design = ~batch+condition)

I'm getting this error:

   Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

Can somebody help me to solve it? Thanks in advance!

batch effect rna-seq • 197 views
ADD COMMENTlink modified 6 weeks ago by ATpoint23k • written 8 weeks ago by Rimma20

Are you sure that vsd$data1 corresponds to the vector encoding the batch variable? Seems to me it should be vsd$batch.

ADD REPLYlink written 8 weeks ago by Friederike5.1k

It looks like batch 2 doesn't contain any of the groups in batch 1 and 3, therefore it is not possible to correct for that batch. Are you sure there is at least one overlapping group in batch 2, that is also found in batch 1 and 3?

ADD REPLYlink written 6 weeks ago by Benn7.7k
gravatar for ATpoint
6 weeks ago by
ATpoint23k wrote:

RNA-seq is strongly confounded by the kit and library preparation method from what I've seen. The confounding effect kight dominate the biological variability. The confounding effect probably dominates any kind of biological differences, see here for example a PCA that I made from five independent data sources, processed identically from the in silico side.

Edit: Check if correct use of batch removal attempts as Benn says below can limit the confounding effect.

enter image description here

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by ATpoint23k

But you can correct for batch when there are overlapping groups. However, I suspect that OP's batch 2 doesn't have any overlapping group...

ADD REPLYlink written 6 weeks ago by Benn7.7k

True, but to what extend. Do you have experience on how well this works. I mean "mild" batch effects like different culture conditions in the lab, samples taken on different days or different sequencing protocols might be correctable, but can you really "regress" out the effect of different kits and laboratories?

ADD REPLYlink written 6 weeks ago by ATpoint23k

I have only experience with removeBatchEffect() from edgeR/limma, they work fine, especially for visualization. Clearly the limma::removeBatchEffect code from OP did not work properly. Like Friederike is already suspecting.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Benn7.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour