How to remove batch effect from RNA-seq data?
1
3
Entering edit mode
6.4 years ago
alesssia ▴ 570

Dear all,

I received RNAseq gene expression data that show batch effects for several technical confounders. I am performing a differential expression analysis using DESeq2, and have tried to add these effects as parameters to my design (~batch1+batch2+condition), but some of them are also linear combinations of the others, resulting in a model matrix that is not full rank.

Someone suggests the usage of tools such as Combat or SVA, but I am aware that transformed values are no longer integer count and I wonder whether the usage of these values will affects the DESeq2 outcomes.

What is the correct way to remove batch effects in this case?

Thank you very much!

RNA-Seq batch DEseq2 combat SVA • 9.0k views
ADD COMMENT
3
Entering edit mode
6.4 years ago
Michael Love ★ 2.3k

We have an example of using sva in the workflow: 

http://www.bioconductor.org/help/workflows/rnaseqGene/#batch

However, it sounds like you already know the batch. I don't understand how one would have multiple batch terms. What do these represent? What does your sample table look like (colData)?

ADD COMMENT
0
Entering edit mode

I have GC content,  date of the sequencing, and primer index. Unfortunately in some days only few (2-3) samples were sequenced. 

ADD REPLY
0
Entering edit mode

It appears to me that GC content is a gene-level confounding variable, while date of sequencing and primer index are sample-level confounders. Therefore it is not clear to me how you would account for GC content at the sample-level (like the DESeq2 usage ~batch1+batch2+condition would indicate). However, DESeq2 allows you to control for gene-level confounders when estimating the size factors. Please see DESeq2 documentation for details on how to do that. 

ADD REPLY
0
Entering edit mode

Sorry if I answered too quickly without thinking. The main problem is that I have two conditions (cases and controls) and (in about ~80% of the samples) in the same day either only cases or only controls have been sequenced (and primer index makes things worse). I know that this is a really poor design, but that's the way it is. This yields to an model matrix that is not full rank when I introduce the batches in the design. 

ADD REPLY

Login before adding your answer.

Traffic: 2395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6