I would like to perform an allele association statistical test for my genome-wide SNP data including 500.000 SNP for 500 controls and 250 cases. I assume that some of the samples are outliers (also indicated by PCA plots) and I would like to minimise their effect on the resulting p-value (Chi-square test) Therefore I plan to apply resampling, meaning that I iteratively take a subset of the samples, perform the test and collect the different p-values. After the collection, I only select those SNPs which show in the majority of tests a strongly significant result.
Do you think this a valid approach ? Do you know any alternative approaches to test for robustness of the allele association test ?