7.0 years ago by
It would help if you provided the citation, but most likely the authors are attempting to minimize the effects of cryptic (i.e. unwanted and unplanned) genetic diversity as a confounder with their eQTL study. If you mean to sample people from a single population and perform an association test against the genotypes from that population, you'd like the only thing affecting the dependent variable to be the genotype and other "official" covariates. However, in a population-based sample you can get subgroups of subjects who have systematic differences in their genetic structure. Let's say you have patients from the North and from the South, and patients within a geographic group are more similar to each other genetically than they are to patients in the other group. Some of the time these differences will co-vary with your dependent variable, misleading you about the effects of a given genotype. One way this can happen is if the two populations have different minor allele frequencies for a given locus or set of loci, and within these populations there is no association with the dependent variable. However, if the variable is associated in some way with the cryptic populations, you might think the specific genotypes are associated with the variable instead of the populations as a whole. Another case is where you think you have patients from a single ethnic background (and therefore with a genetic background that has a given degree of similarity) but there is a minority population that contains significant genetic contribution from some other ethnicity. Usually you'd like to remove those effects as best you can in order to test only the effects of genotype on your dependent variable.
The PCA in this case is an attempt to account for the greatest sources of undesired variance in the genotype data, thus reducing the effect of cryptic diversity. You would probably test empirically for the "correct" number of PCs to adjust for; I don't know if there is an established dogma about this, or if it's just part of the practice of genetic epidemiology that you would look for PCs that appear to be affecting the analysis and attempt to remove them.