Question: Removing ancestral outliers in GWAS
gravatar for johnja
2.6 years ago by
johnja0 wrote:

I am new to GWAS. I am now at the step where I want to remove cases and controls of non-European ancestry. So I recently performed principal components analysis using plink on cases and controls for a practice GWAS analysis. I then merged my data with the data on the 11 populations from HapMap3. I am unclear how to proceed in the next steps, and I feel like the many articles I have viewed assume that the reader already has certain knowledge.

My thoughts on what to do next are to:

Use R to subset the CEU and TSI populations, as they are European.

Find the means and standard deviations of the first two principal component scores
Choose a threshold value to determine outliers (a certain number of standard deviations away from mean PC1 and PC2 scores)
write an R script to produce a file on the cases and controls to eliminate for non-European ancestry.
Use plink to eliminate those non-European samples

My Question: Is this method correct? I have no idea to which the threshold for outliers should be set.

gwas genome • 2.1k views
ADD COMMENTlink modified 2.6 years ago by Kevin Blighe60k • written 2.6 years ago by johnja0
gravatar for Kevin Blighe
2.6 years ago by
Kevin Blighe60k
Kevin Blighe60k wrote:

Hello johnja,

Yes, it is quite standard to remove samples that are 2 or 3 standard deviations (SDs) from the group mean through PCA. You can either code this manually by converting the values for a given eigenvector (i.e. principal component) to Z-scores, where Z=1 is 1 SD from the mean, Z=2 is 2 SDs, et cetera). For example, if you know that your suspected outlier is from the British Isles (Republic of Ireland and the United Kingdom of Great Britain and Northern Ireland), then check its Z-score in relation to the other GBR (British in England and Scotland) 1000 Genomes EUR samples.

PLINK already has an implementation of this through identity-by-state (IBS) clustering, where it also gauges outliers by Z-scores:


ADD COMMENTlink modified 18 months ago • written 2.6 years ago by Kevin Blighe60k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1522 users visited in the last hour