Question

EdgeR: Correcting for lane effects in addition to replicate batch effects?

0

Entering edit mode

8.6 years ago

Ekarl2 ▴ 120

I have 16 samples: 4 environments and 4 biological replicates per environment.

The first replicate for each environment was processed together, the second replicate for each environment was processed together and so on. I have included this batch factor in my RNA-seq differential expression analysis.

There is also another possible batch effect that comes from the fact that all replicates 1+2 for all environments were sequenced in one lane and all replicates 3+4 from all environments were sequenced in another lane (henceforth lane effect).

In other words, environments are equally represented in both the processing and sequencing.

However, I am considering whether to include this lane effect as a factor in my additive model for my differential expression analysis with EdgeR. I would like to do it because of stringency and consistency, but I am concerned about the risk of e. g. overfitting (or any other downside to including it).

Is there a good way to find out whether it is defensible to include this lane factor? Are there any circumstances where the inclusion of this lane factor would be inadvisable? Is it just a matter of testing for differential expression between lanes and see if this is a lot or not and then make a qualitative judgement call?

I am particularly interested if there are any downsides to including this lane effect even if the number of genes that are differential expressed between lanes is low (but not very low) or moderate.

edgeR batch-effect lane-effect • 3.0k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by Ekarl2 ▴ 120

Ram · Answer 1 · 2015-09-23

Yes, you could test to see if the lane effect is significant (probably not). The main downside to including a term in the model that does not add anything is that you use one degree of freedom (thereby slightly reducing power). I suspect that your lane effect is likely to be small, but you have enough samples that I also suspect that including the lane term will not significantly hurt your results.