Removing ancestral outliers in GWAS
1
0
Entering edit mode
6.5 years ago
johnja • 0

I am new to GWAS. I am now at the step where I want to remove cases and controls of non-European ancestry. So I recently performed principal components analysis using plink on cases and controls for a practice GWAS analysis. I then merged my data with the data on the 11 populations from HapMap3. I am unclear how to proceed in the next steps, and I feel like the many articles I have viewed assume that the reader already has certain knowledge.

My thoughts on what to do next are to:

Use R to subset the CEU and TSI populations, as they are European.

Find the means and standard deviations of the first two principal component scores
Choose a threshold value to determine outliers (a certain number of standard deviations away from mean PC1 and PC2 scores)
write an R script to produce a file on the cases and controls to eliminate for non-European ancestry.
Use plink to eliminate those non-European samples

My Question: Is this method correct? I have no idea to which the threshold for outliers should be set.

genome gwas • 4.5k views
ADD COMMENT
1
Entering edit mode
6.5 years ago

Hello johnja,

Yes, it is quite standard to remove samples that are 2 or 3 standard deviations (SDs) from the group mean through PCA. You can either code this manually by converting the values for a given eigenvector (i.e. principal component) to Z-scores, where Z=1 is 1 SD from the mean, Z=2 is 2 SDs, et cetera). For example, if you know that your suspected outlier is from the British Isles (Republic of Ireland and the United Kingdom of Great Britain and Northern Ireland), then check its Z-score in relation to the other GBR (British in England and Scotland) 1000 Genomes EUR samples.

PLINK already has an implementation of this through identity-by-state (IBS) clustering, where it also gauges outliers by Z-scores: http://zzz.bwh.harvard.edu/plink/strat.shtml#outlier

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 3195 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6