I am new to GWAS. I am now at the step where I want to remove cases and controls of non-European ancestry. So I recently performed principal components analysis using plink on cases and controls for a practice GWAS analysis. I then merged my data with the data on the 11 populations from HapMap3. I am unclear how to proceed in the next steps, and I feel like the many articles I have viewed assume that the reader already has certain knowledge.
My thoughts on what to do next are to:
Use R to subset the CEU and TSI populations, as they are European. Find the means and standard deviations of the first two principal component scores Choose a threshold value to determine outliers (a certain number of standard deviations away from mean PC1 and PC2 scores) write an R script to produce a file on the cases and controls to eliminate for non-European ancestry. Use plink to eliminate those non-European samples
My Question: Is this method correct? I have no idea to which the threshold for outliers should be set.