Remove batch effects on the train set to avoid information leakage
0
0
Entering edit mode
8 months ago
JACKY ▴ 140

I aim to apply Limma's removeBatchEffect function on my data, but only after splitting it into train and test sets. I'm aware that applying batch correction before this partition can introduce information leakage, so I want to avoid that. Previously, I've been batch correcting my entire dataset as follows:

cancer.type = metdata$Cancer_Type
correctedTPM = limma::removeBatchEffect(TPM, batch = cancer.type)

I'd like to adjust my approach: first correct the training set and then utilize the derived parameters from the training set to correct the test set. This is analogous to the best practices for data scaling. Is there a method in R to achieve this with removeBatchEffect or another technique?

r limma batch-effect • 552 views
ADD COMMENT
0
Entering edit mode

I've seen bad experiment design where biological variables get confounded with sequencing batches but this is the first time I'm encountering wanton disregard for biology and abuse of batch correction techniques.

ADD REPLY

Login before adding your answer.

Traffic: 2154 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6