Question: RNA-seq batch effect due to sequencing platform
gravatar for Ashutosh Pandey
5.6 years ago by
Ashutosh Pandey12k wrote:

Background: I have to perform a differential gene expression analysis using RNA-seq data.  We have two genotypes. We have RNA-seq data for control and data after treatment for both genotypes. We also have two biological replicates for each case. In short, we have eight samples that include four samples before (2 genotypes x 2 replicates) and four samples after the treatment. Our goal is to find out genes that show differential expression between the genotypes after the treatment. Note: Actually we have have several treatments but i have tried to keep the question simple.

Problem: Now the problem is that each biological replicate was run on a different sequencing platform including Ion Proton and SOLiD Wildfire. Trust me it wasn't my idea. Don't kill the messenger (bioinformatician) :-)

Now we see a huge difference in expression counts between biological replicates that is purely due to batch (platform) effect. PCA clusters samples according to platforms and not the treatment or the strains. The samples from Ion proton always show high read counts. Same applies to RPKM values so the problem is not because of the difference in sequencing depth. The batch effect is not consistent between all the pair of biological replicates, and correlation between counts from two different platforms (or biological replicates) range between 0.3 to 0.6 for different case. I can use the batch as a covariate in my DEseq2 analysis, but is there A) any other better approach to remove the variation due to different sequencing platforms. Reason being is that there are samples after multiple treatments and we may need to merge reads from almost similar treatments into one later on. So scaling or correcting values will be better so that the new counts from almost similar treatments may be merged into one.  B) Should I perform correction at the level of biological replicate or should I create two groups (Wildfire and IonProton) and perform batch correction using all the samples (4 Wildfire and 4 IonProton, actually i have lots of samples for wildfire and Ionproton as i have multiple treatments but i mentioned only two as I wanted to keep the question simple) ? C) I have never used Combat but I read that it doesn't work for small sample sizes, so I may need to carry out batch correction using all the samples although the batch effect is inconsistent. Also as Combat takes log transformed normalized data as input, I won't be able to use new output counts as input for DESeq2. I may have to use limma, right??  Please excuse me if I haven't used the correct terminology. I am new to this.


batch effect rna-seq • 3.4k views
ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Ashutosh Pandey12k

Hey Ashutosh,


  How about using Surrogate Variance Analysis for removal of batch efffect. Theere is "Combat" of SVA package  from Bioconductor to remove batch effect.

or I think quantile normalization of your log transformed counts per million would also help in your case 

ADD REPLYlink written 5.6 years ago by Manvendra Singh2.1k

Thanks Manvendra.

ADD REPLYlink written 5.6 years ago by Ashutosh Pandey12k

RUVSeq worked very well on my dataset. May be you can give a try.

ADD REPLYlink written 5.4 years ago by geek_y11k
gravatar for andrew.j.skelton73
5.6 years ago by
andrew.j.skelton736.1k wrote:

As far as I'm aware, SVAseq will only identify potential technical variation, not correct for it (though this may have changed from the last time I looked). What analyses are you carrying out? I think you're going to have to tackle this in a different way to "batch correction", but rather account for it in model designs. If you're using DESeq2 for example, include the batches as a term in the model design. 

ADD COMMENTlink written 5.6 years ago by andrew.j.skelton736.1k

Thanks Andrew.

ADD REPLYlink written 5.6 years ago by Ashutosh Pandey12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2628 users visited in the last hour