I have 16 samples: 4 environments and 4 biological replicates per environment.
The first replicate for each environment was processed together, the second replicate for each environment was processed together and so on. I have included this batch factor in my RNA-seq differential expression analysis.
There is also another possible batch effect that comes from the fact that all replicates 1+2 for all environments were sequenced in one lane and all replicates 3+4 from all environments were sequenced in another lane (henceforth lane effect).
In other words, environments are equally represented in both the processing and sequencing.
However, I am considering whether to include this lane effect as a factor in my additive model for my differential expression analysis with EdgeR. I would like to do it because of stringency and consistency, but I am concerned about the risk of e. g. overfitting (or any other downside to including it).
Is there a good way to find out whether it is defensible to include this lane factor? Are there any circumstances where the inclusion of this lane factor would be inadvisable? Is it just a matter of testing for differential expression between lanes and see if this is a lot or not and then make a qualitative judgement call?
I am particularly interested if there are any downsides to including this lane effect even if the number of genes that are differential expressed between lanes is low (but not very low) or moderate.