What is the simple way to remove known batch effect from RNA-seq data ?
3
5
Entering edit mode
8.1 years ago
jack ▴ 950

Hi,

I have RNA_seq gene expression data measured using two different technology (Illumina GA and Illumina Hiseq) for specific cancer type. I checked for the cross platform bias (batch effect) using PCA plot. it seems that samples of the each technology clusters togethers. so I think I have batch effect in my data.

What is the simple way to remove batch effect ? what it will do for me ? the expression values will change after removing batch effect ?

next-gen genomics R software-error RNA-Seq • 19k views
ADD COMMENT
11
Entering edit mode
8.1 years ago

If the batch effect is very consistent (i.e, you have essentially the same bias in each sample within a batch), then you can just add it as a parameter to your design. So instead of fitting with ~condition+genotype (or whatever you're doing) you'd instead do ~batch+condition+genotype.

Combat/SVA/etc. are excellent tools, but their utility mostly only comes into play when the batch variably affects the samples.

ADD COMMENT
0
Entering edit mode

I think it's effect the samples, because when I did PCA plot, samples of each technology clustered togeher. so I should go for Combat/SVA ?

ADD REPLY
1
Entering edit mode

Reread what I wrote. Of course the batch affects the samples (otherwise, you wouldn't be asking about this), that's not the deciding factor of whether to use a batch variable or whether one really needs to use Combat.

ADD REPLY
3
Entering edit mode
8.1 years ago
Sam ★ 4.5k

You can try and use the ComBat algorithm. It was originally designed for microarray data but I think some people also use it for RNA Seq data. Essentially, this tool will try to normalize your data to remove the batch effect. However, after the correction, the counts are normalized and might not be preferable for use in tools like DESeq or EdgeR which require raw read count.

I am not sure whether if you can correct for batch effect within DESeq if you want to use it for analysis. Maybe someone else can answer that part.

(Or maybe, you can follow the part with multi-factor design in this document)

ADD COMMENT
0
Entering edit mode

The corrected values are only appropriate for tools like limma, since they're no longer integer counts.

ADD REPLY
3
Entering edit mode
8.1 years ago
Manvendra Singh ★ 2.2k

Sam is right ComBat is designed to remove batch effects between microarray samples, and could be applicable in your case too.

I would suggest to use ComBat and also normalize to their quantiles

There is a built-in function in limma package

library(limma) ### hope you already have limma from biocLite
normalizeQuantiles(data, ties=TRUE)

Draw a boxplot for both methods i.e. ComBat and quantile normalization

Then see which one has normalized your data better and go for it

ADD COMMENT

Login before adding your answer.

Traffic: 2649 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6