Question: Batch effects in sequencing data
3.3 years ago
Andrew wrote:

I am looking for biases in sequencing data that may appear at any step in the process of sequencing, caused by batch effects.  For example, the impact of library prep on sequencing data.  I know where these biases could occur, I just don't know what type of impacts they will have (impact coverage, GC content, etc). 

Does anyone know of any papers which discuss this? I have found a few but mainly they just mention that they are correcting for batch effects but don't actually say what they are correcting.

Any help is much appreciated.

3.3 years ago
Asaf wrote:

When comparing expression levels with RNA-seq you have to make sure your library prep and sequencing is the same. Issues like ligation bias, RNA fragments length might influence the number of reads each mRNA has in the sequencing results, even if the initial amount of mRNAs was the same. The easiest way to correct for batch effect is to add the batch to the table of conditions (and to the linear model) of DESeq2, if there will be differences between samples that can be explained by the batch it will ignore this effect when calculating the effect of the differences in conditions.

3.3 years ago
New Zealand
Thomas Johnson wrote:

Have seen some large batch effects coming from batches using different versions of illumina chemistry. With the older chemsitry on hiseq we see many regions with very low coverage, these regions tend to be GC rich. More recently we've done more hiseq but with PCR free libraries and a lot of these problems go away. We have a some success correcting for this batch effect in GWAS by introducing the batch number as a co-variate.

