Question: PLINK Principal Components not adequately controlling for population stratification in linear regression?
gravatar for dam4l
4.1 years ago by
dam4l150 wrote:

I'm doing a GWAS using ~15 million variants and ~800 people. I am unfamiliar with Linux, so I have tried using PLINK MDS and PCA functions to obtain principal components to be used as covariates in the association analysis to control for population stratification. When I plotted the p-values (QQ plot) obtained from the association analysis, the distribution was pretty messy, suggesting that I did not adequately control for population stratification. I took the following steps:

  1. Pruned based on LD using PLINK --indep
  2. Created a genome file:

    ./plink --bfile file --genome --extract

  3. Used --pca to generate an eigenvec file containing PCs

    ./plink --bfile gendep_merged --cluster --pca header --extract --read-genome plink.genome

  4. Performed the association analysis using 10 PCs from the eigenvec file as covariates:

    ./plink --bfile file --pheno phenotype.txt --allow-no-sex --covar plink.eigenvec --covar-name PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10 --out association --linear --adjust

Am I missing a step or should any of the flags used by modified in order to produce PCs that will adequately control for population stratification in this sample?

Any input would be greatly appreciated.

ADD COMMENTlink modified 4.1 years ago by andrew.j.skelton735.9k • written 4.1 years ago by dam4l150

How exactly is using the first ten principle components controlling for "population stratification"? If I understand correctly, you're performing an association test, and telling the model fit to smooth out the ten biggest drivers of variance in your dataset? When you checked the principle components, did they indicate that the first ten explained the difference in population? Could you be smoothing out the effect you're testing for instead?

ADD REPLYlink written 4.1 years ago by andrew.j.skelton735.9k

Using 10 does indeed seem a bit excessive. You should only use the PCs that actually stratify your population. If that's none of them, then do not include any.

ADD REPLYlink written 19 months ago by Kevin Blighe56k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1957 users visited in the last hour