I have a relatively small dataset of cases (n=~800) and a very large amount of controls (n=~8000) in plink format. Now I would like to have the case-control ration 1:3 for further analysis. I was wondering is there a method or are there scripts available that perform some kind of PCA matching of controls to create a more homogenous population or is just random sampling the normal way to go... I tried filtering controls based on e.g. 3SD but not many controls were removed then suggesting the data is already pretty homogenous..