I am currently working on running a GWAS using GEMMA to apply a linear mixed model, and I've been experiencing some difficulties with my analysis. Specifically the QQ plot has shown some early deviation from the expected line, as I described in a previous question. I also recently noticed that the log file from GEMMA reports the PVE under the null model is roughly 1e-5, with a standard error estimate of 0.2. I am confused how to interpret this. Does this mean that my phenotype is almost entirely environmental rather than genetic? Or does it mean the opposite, because that's the estimate under the "null model"?
The extremely low pve estimate (and comparatively large standard error) makes me think that there may be something wrong in my analysis setup. An additional note: If I omit the covariates file (containing the top 5 principle components), the QQ plot ends up having a strange wavy-looking shape, further suggesting that something is amiss here. I've included some extra details about my analysis below. If anyone has any suggestions on what I might be doing wrong, I'd greatly appreciated it.
This is the command I used:
gemma -bfile myfile_clean -k myfile_cXX.txt -lmm 2 -c covariates.txt -o myfile
The p-value estimates are based on a likelihood ratio test. I used GEMMA to estimate the centered genetic relationship matrix after LD pruning, and also included the eigenvectors for the top 5 principle components (generated by GCTA software) as covariates. None of these principle components were significantly associated with my phenotype (based on a logistic regression) but I included them as covariates anyway, and as I said omitting them causes a weird wavy-looking QQ plot. I'm using a Drosophila model (150 inbred lines) and determined that Wolbachia infection status and major chromosomal inversions are not associated with my phenotype, so I didn't include them as covariates. When I ran it again with them included as covariates in addition to the principle components, the QQ plot looked basically the same, as well as the pve estimate.
Also, my phenotype is binary (case/control) and I've dealing with an unbalanced dataset (30 cases, 120 controls). When I tried balancing it by only including 30 randomly-selected controls, the resulting QQ plot showed the data being below the expected line, rather than above, and the pve was still very low. I'm at a loss for what could be causing these problems.