Question: What is the simple way to remove known batch effect from RNA-seq data ?
4
gravatar for jack
4.4 years ago by
jack750
Germany
jack750 wrote:

Hi,

I have RNA_seq gene expression data measured using two different technology (Illumina GA and Illumina Hiseq) for specific cancer type. I checked for the cross platform bias (batch effect) using PCA plot. it seems that samples of the each technology clusters togethers. so I think I have batch effect in my data.

What is the simple way to remove batch effect ? what it will do for me ? the expression values will change after removing batch effect ?

 

 

 

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by jack750
10
gravatar for Devon Ryan
4.4 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

If the batch effect is very consistent (i.e, you have essentially the same bias in each sample within a batch), then you can just add it as a parameter to your design. So instead of fitting with ~condition+genotype (or whatever you're doing) you'd instead do ~batch+condition+genotype.

Combat/SVA/etc. are excellent tools, but their utility mostly only comes into play when the batch variably affects the samples.

ADD COMMENTlink written 4.4 years ago by Devon Ryan88k

I think it's effect the samples, because when I did PCA plot, samples of each technology clustered togeher. so I should go for Combat/SVA ?

ADD REPLYlink written 4.4 years ago by jack750
1

Reread what I wrote. Of course the batch affects the samples (otherwise, you wouldn't be asking about this), that's not the deciding factor of whether to use a batch variable or whether one really needs to use Combat.

ADD REPLYlink written 4.4 years ago by Devon Ryan88k
3
gravatar for Sam
4.4 years ago by
Sam2.2k
London
Sam2.2k wrote:

You can try and use the ComBat algorithm. It was originally designed for microarray data but I think some people also use it for RNA Seq data. Essentially, this tool will try to normalize your data to remove the batch effect. However, after the correction, the counts are normalized and might not be preferable for use in tools like DESeq or EdgeR which require raw read count.

I am not sure whether if you can correct for batch effect within DESeq if you want to use it for analysis. Maybe someone else can answer that part. 

(Or maybe, you can follow the part with multi-factor design in this document)

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Sam2.2k

The corrected values are only appropriate for tools like limma, since they're no longer integer counts.

ADD REPLYlink written 4.4 years ago by Devon Ryan88k
3
gravatar for Manvendra Singh
4.4 years ago by
Manvendra Singh2.0k
Berlin, Germany
Manvendra Singh2.0k wrote:

Sam is right ComBat is designed to remove batch effects between microarray samples, and could be applicable in your case too.

I would suggest to use ComBat and also normalize to their quantiles

there is built-in function in limma package

library(limma)   ### hope you already have limma from biocLite

normalizeQuantiles(data, ties=TRUE)

draw a boxplot for both methods i.e. ComBat and quantile normalization

then see which one has normalized your data better and go for it 

ADD COMMENTlink written 4.4 years ago by Manvendra Singh2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2559 users visited in the last hour