Working with a proteome dataset I would like to check and correct for batch effects. For batch effect detection I am using PVCA, for batch effect correction LIMMAS removeBatchEffect. Data come from DIA proteome analysis, quantified using Biognosys Spectronaut 11, details can be found in our first paper about it. Quantitative data were log2 transformed before analysis and correction.
Using PVCA I get the following result for my uncorrected data:
I have read that for batch correction to be valid, data must be well balanced. But what exactly does this mean?
- I compare proteomes of patients with different disease genotypes to healthy. I suspect that as I have a lot of (12) different genotypes including many unknowns I will need to use only disease vs no disease as the status to be protected? I have in total 70 healthy and 70 patients but using the different genotypes the patient group would be widely dispersed with many single ones and many unknowns (that definitely do not all have the very same genotype).
- Is it important that my factors to correct for are evenly distributed by themselves or also in combination if I want to correct for multiple? EG: My samples are well balanced for date processed and also for cell number (meaning I have in both cases roughly equal amounts of healthy and patient samples). But if I check for the distribution using date AND cell number the balance is quite off (with few combinations having no patients or no healthies). So is it required that both are balanced together or is it fine as long as each factor is balanced for itself?
- Is its fine to use extreme correction measures (eg date.processed, protease.inhibitor, cell number and age) for data that are used for visualisation and clustering while in LIMMA blocking only for date_processed (and protease inhibitor)? When blocking in LIMMA, is it important to again check for well balanced distribution, again of all factors in combination? (I suspect that yes and that I will need to decide for one factor to correct for and keep the rest as it is)
Thanks a lot for your suggestions!