inflated QQ-PLOT
1
1
Entering edit mode
4 months ago
putty ▴ 30

Dear All,

I did linear regression in plink/1.90 for quantitative trait (N=450) using SNP array data. Plotted qq plot using QQ-man package in R. I got the following qq plot which shows a sharp peak followed by downward trend; i am confused why the downward trend after the peak; Any possible explanations would be highly appreciated. Regression adjusted for PC1-PC5, and 3 clinical covariates.

Thanks in advance ! (I am new to field of genetics)

![qq plot][1]

QQ-plot inflation plink GWAS • 595 views
2
Entering edit mode
4 months ago

The p-values can be regarded as right-skewed, I guess, heavily-skewed, with that well-defined 'tail' that wanders up from x = 3. The other points that are then at the same value of y at the end of this tail are indicative of p-values that are the same. This could occur via over-adjustment via the 5 PCs + 3 covariates. Do you need 5 PCs and have evidence that they are important?

For example, if I look at this, I'd be happy to just have PC1 and PC3 as covariates:

[from Produce PCA bi-plot for 1000 Genomes Phase III - Version 2]

Similar question for the other covariates, and also are these logged or set to some power?

Sample size? selection criteria? QC of data?

Various things to consider.

Kevin

0
Entering edit mode

Thanks Kevin for the reply. The sample size is 450. I agree with you, i might have overcorrected/overadjusted, will try to re-plot taking only 3 PCs. I am not sure how relevant is it to include the 3 clinical covariates (BMI, HbA1c, duration of diabetes). I included these clinical covariates because they were statistically significant during my clinical analysis. The data has been properly QCed. Also, do we really need to adjust for population stratification if lambda is <1 in this case. I did a separate PCA for my data before running linear regression, and i am quite confident that my data doesn't seem to have any ethnic outlier, all data points are well clustered with European population.

Sorry i didn't get your "logged or set to some power" point.

Thank you once again for the response.

1
Entering edit mode

I see, diabetic retinopathy? - I was working in that area last year: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7696981/

What I mean by 'logged or set to some power' is a transformation performed on the covariates to bring them (their residuals) to a normal distribution, e.g., log(HbA1c) or BMI^2

For the PC covariates, we would usually include these for population stratification, or did you have another reason for including these? They can be used in place of and to 'summarise' noisy data, to some degree, but usually they are used for population strat.

Another thing for which to watch out is imbalanced groups (disease versus healthy).

Also keep in mind that, in a disease cohort, we would actually expect a tail on the QQ plot, and it is the tail that may refer to those genetic variants that are related to the disease itself.

0
Entering edit mode

Hi Kevin, Yeh, it is related to diabetic retinopathy, and the data does include diabetics with diabetic retinopathy along with controls ( diabetics without diabetic retinopathy). The phenotype i am using is macular thickness (quantitative variable) which is expected to be higher in cases compared to controls, so i guess that explains the tail, but my confusion is why the deep in the tail after x=5.5ish? I was expecting the tail would continue going up. I did use the PC solely for population stratification. ![population stratification][1]

Nope i haven't transformed the covariates to normalise the residuals. The data is imbalanced (controls ~1.8 times more than cases).

The image attached show my pca for population stratifcation. I excluded all participants with PC2 (y axis) > 0.0 (OWN is my dataset)

Thank you !

[1]: