I am working on a 20 sample dataset. I need to isentify De genes. 10 samples were collected in the collection center (CC) 1 and 10 samples in the CC2. Each group has 5 samples with both condition1 and condition2. Unfortunately, when I tested DE CC1 vs CC2 only, there are several survival pathways upregulated in CC2 samples. This is due to the fact that CC2 send us samples later respect to CC1 (in terms of days). When I calculated DE among condition1 vs condition2 (using CC as covariate) I cannot observe DE genes (FDR 0.05). From PCA I see that differences in CC are far stronger than condition1 vs 2. To my knowledge, both Combat and remoBatch (limma) destroy biologic variability so I am not confident of using them. My question is: what to do in this setup?
Thanks for the answer. However I am not sure this could be considered batch since CC2 samples have true upregulation of specific pathways that could (in principle) buffer variation when condition 1 vs 2 are tested.
If this is the case then you cannot correct for anything as this effect is nested with center.
It you really do have a reason to suspect that the biology of samples from CC1 is different from the biology of samples from CC2, and that CC1 is the correct biology and CC2 the wrong biology, then their may be an arguement from discarding the samples from CC2 and just conducting the analysis on the samples from CC1.
That's exactly what I've done. However the dispersion is really high (primary samples) and the total number of samples seems really low to obtain results with a good confidence.