Hello
I conducted an RNA-seq experiment comprising three biological replicates (batches) under three different conditions at two distinct time points (non-time series). The conditions involve exposure to chemicals A, B, and A+B at times 20 and 60. I aim to identify differentially expressed genes at the intersection of (A vs A+B) and (B vs A+B). My query pertains to the optimal approach for mitigating batch effects, which are notably pronounced. Should I address batch effects across all samples simultaneously, or is it preferable to subgroup the data? I can subgroup the data based on time points (20 and 60), or alternatively, I could perform batch effect removal for each separate analysis, such as between (A vs A+B).?
Thank you!
Depends on your analysis goal. But since the experimental design is good (all groups in all batches) you can do as you like. If you want to compare 20 to 60 then leave them in one analysis. Generally, I would not split unless absolutely necessary because it creates additional complexity and two separate analysis.
Just want to clarify something. Are biological replicates considered different batches? I thought batches referred to data collected from separate protocols. For example,
1) If my lab collect 1 sample from 3 mice (total = 3 samples) at the same time with the same protocol, and send them for sequencing with the same protocol, and then analyse the sequencing data with the same protocol, I would have 3 biological replicates and not 3 batches.
2) If 1 of the mice came from a different lab.... and 2 from my own lab... then I would have 2 batches and would need to consider batch effect.
Is my understanding wrong?
Yes.
Yes
Yes. A batch is any source of unwanted technical variation. That can be experiments done at different days, with different kits, by different labs, a combination of all, and any other factor that does not represent your biological readout of interest.
Thank you ATpoint
Then I'm struggling to see how batch effect pertains to the description of the experiment?
"3 biological replicates (batches)". ...I'm failing to see how these are batches. Why is batch effect a concern here? 2 replicates, all subject to the same 3 conditions, and the same 2 time points. I'm not seeing it.
Am I to understand that even when every sample is treated equally, we should still always run batch effect removal software?
Thanks, and I appreciate the help with clarifying my understanding of when to be mindful of batch effect.