Question: Determining the subjects to use after obtaining a PC plot of population substructure
I am doing QC for a GWAS analysis. I used pc-AIR and pc relate (two Bioconductor Packages) to determine the relatedness and population substructure of my given dataset. I compared it to 1000 genomes data and have a plot comparing the first two PCs in my PCA analysis. In general, what is the best practice for excluding subjects from a study after visually scrutinizing the PC plot. Is there a specific method (ie R package) to use that's considered best practice? or do I arbitrarily decide that base on the graph I want to include a certain set of subjects?

Thanks for your thoughts, in advance.

Here you will find a very detailed answer. Also, I suggest you give a look to GENABEL manual ( In paragraph 5.3 they describe the method used for outlier detection.

