I'm running a GWAS for the first time and modeling my analysis based on this paper, since the nature of our data is similar. The analysis uses GEMMA to apply a linear mixed model and includes the top five principle components as covariates. However, when I ran the analysis and generated a QQ plot, it showed substantial evidence of population structure (early divergence from the expected line).
To correct the population structure issues, I tried using the top 20 principle components instead of the top 5. I also tried pruning for linkage disequilibrium prior to generating the principle components. Neither of these significantly altered the QQ plot.
The only other thing I can think of is to try correcting my phenotypes for additional covariates. I'm using a fly model and so I could correct for Wolbachia infection status and the presence of chromosomal inversions (thought none of these are individually associated with my phenotype). However, since my phenotype is binary (case/control), the residuals from fitting a logistic model look strange. The phenotypes start out as 1's and 0's but end up looking more like a continuous variable that includes some negative numbers and ranges from -2 to 5. This would cause GEMMA to treat it like a quantitative phenotype instead of binary and potentially disrupt the analysis.
Does anyone have advice for how I might proceed?
Edit: One option that just occurred to me is to just include the additional covariates (Wolbachia/inversions) in the covariates file along with the eigenvectors generated by PCA. However the fact that all the other papers I've seen with this model have used corrected phenotypes (i.e., covariates regressed out) rather than including them in the covariates file makes me think that perhaps there's some kind of problem with this approach?