Question: Removing ancestral outliers in GWAS
gravatar for johnja
19 months ago by
johnja0 wrote:

I am new to GWAS. I am now at the step where I want to remove cases and controls of non-European ancestry. So I recently performed principal components analysis using plink on cases and controls for a practice GWAS analysis. I then merged my data with the data on the 11 populations from HapMap3. I am unclear how to proceed in the next steps, and I feel like the many articles I have viewed assume that the reader already has certain knowledge.

My thoughts on what to do next are to:

Use R to subset the CEU and TSI populations, as they are European.

Find the means and standard deviations of the first two principal component scores
Choose a threshold value to determine outliers (a certain number of standard deviations away from mean PC1 and PC2 scores)
write an R script to produce a file on the cases and controls to eliminate for non-European ancestry.
Use plink to eliminate those non-European samples

My Question: Is this method correct? I have no idea to which the threshold for outliers should be set.

gwas genome • 1.4k views
ADD COMMENTlink modified 19 months ago by Kevin Blighe42k • written 19 months ago by johnja0
gravatar for Kevin Blighe
19 months ago by
Kevin Blighe42k
Republic of Ireland
Kevin Blighe42k wrote:

Hello johnja,

Yes, it is quite standard to remove samples that are 2 or 3 standard deviations (SDs) from the group mean through PCA. You can either code this manually by converting the values for a given eigenvector (i.e. principal component) to Z-scores, where Z=1 is 1 SD from the mean, Z=2 is 2 SDs, et cetera). For example, if you know that your suspected outlier is from the British Isles (Republic of Ireland and the United Kingdom of Great Britain and Northern Ireland), then check its Z-score in relation to the other GBR (British in England and Scotland) 1000 Genomes EUR samples.

PLINK already has an implementation of this through identity-by-state (IBS) clustering, where it also gauges outliers by Z-scores:


ADD COMMENTlink modified 6 months ago • written 19 months ago by Kevin Blighe42k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 747 users visited in the last hour