Entering edit mode
15 months ago
LauferVA
4.5k
Suppose you generated a dataset that you were really excited about, but had to choose one of the following two options:
1) the data are afflicted by a weak batch effect of some kind, and this batch effect is collinear with an important variable (either the response variable or one of the most important predictors)
OR
2) the data are affected by a strong batch effect, but this batch effect is not significantly correlated with the response variable or any of the most important covariates you plan to include in the model.
Would you choose 1 or 2, and why?
It depends on what kind of dataset you are talking about.
Is it a gene expression data? If yes, is the batch variable known or you have to estimate it from the data?
its hypothetical - would it make a difference to you if it is expression data or something else ? If so, that would be a great subject for a response!!! :-)