I am doing a limma analysis of a data set comprising 4 groups with 50 samples in each. In total I am having 5 different comparisons: Group1 vs Group2; Group1 vs Group3 and so on... Limma gives me a set of differentially expressed genes for each comparison.
Next I want to do a leave-one-out cross-validation of the results for each group-comparison. In total 5 different LOOCV. In the LOOCV I am doing the feature selection with limma for each iteration. The problem I have is that I have to include only those groups I am comparing during the LOOCV, in total 100 samples for each LOOCV. Then the lemma-results will be different when the dataset inly has 100 samples, compared to 200 samples with the full dataset due to normalisation and filtering steps with be affected differently.
Is it correct to do the LOOCV with feature selection on only the 100 samples?