Question

Sea of genomic wide significant hits in GWAS (Plink2) and Absence when using regenie

0

Entering edit mode

14 days ago

a.papadam • 0

Hi all,

I recently performed a GWAS on a continuous measurement (the previous GWAS used a relative binary measurement), but when I run PLINK2, it returns thousands of genome-wide hits without forming any clear peaks. In contrast, when I use REGENIE, there are no significant hits.

This could be due to noise, a confounder, or a genuinely polygenic signal with subtle effects.

However, I am unsure how to test each scenario and correct for it if possible. I have tried various covariates in my GWAS, different scaling methods, excluding outliers, etc.

Any insights?

Thanks in advance.

Genome-wide GWAS snps regenie plink2 • 694 views

ADD COMMENT • link updated 1 day ago by Kevin Blighe 89k • written 14 days ago by a.papadam • 0

0

Entering edit mode

Hi,

It would be useful to know the commands you used to perform these association tests.

The first thing that comes to mind is that the p-values from PLINK are not corrected for multiple testing. So, your "significant hits" will become not-significant after you correct them for the number of tests you are doing. (see "Basic multiple testing correction" at https://www.cog-genomics.org/plink/2.0/assoc).

Also, if you have some family structure in your dataset, a mixed-model (eg: regenie) can account for that, but not PLINK.

ADD REPLY • link 9 days ago by Corentin ▴ 660

0

Entering edit mode

Hello,

Thank you for replying.

The commands i used are the following: -covar-variance-standardize -geno 0.05 -glm hide-covar -hwe 1e-6 midp keep-fewhet -maf 0.01 -mind 0.1 -no-input-missing-phenotype -ci 0.95 -mach-r2-filter 0.8 2

Regarding the multiple testing, I report the raw PLINK2 -glm P values, which are extremely small 5×10^-200, far smaller the 5×10^-8 threshold.

Regarding the family structure, I have removed related individuals.

Thank you in advance.

ADD REPLY • link 6 days ago by a.papadam • 0

score 0 · Answer 1 · 2025-11-07

Hey,

Thanks for posting - always good to see GWAS discussions here. I'll try to address your query based on what you've shared (including the command and clarifications in the comments). The discrepancy between PLINK2 (thousands of diffuse genome-wide hits) and REGENIE (none) is a classic sign of unaccounted-for inflation in PLINK2, likely from residual population structure, cryptic relatedness, or other confounders, rather than a true polygenic signal (which would typically show some clustering or peaks, even if broad). Noise or over-correction in REGENIE could also play a role, but given your extremely low p-values in PLINK2 (e.g., 5e-200), this screams lambda inflation. A genuinely polygenic trait with subtle effects shouldn't flatten the Manhattan plot like that without some loci standing out.

Here are some steps to diagnose and test each scenario - I'll focus on practical checks since you've already tried covariates, scaling, and outlier removal. I'll assume you're using standard QC'd data (e.g., MAF >0.01, HWE, etc., as in your command).

1. Check for Inflation / Confounding (Population Structure or Cryptic Relatedness)

Why this matters: PLINK2's --glm is a fixed-effects linear model that doesn't inherently handle relatedness or structure unless you add covariates (like PCs). REGENIE uses a mixed model (whole-genome regression) that accounts for it via the GRM, which could deflate your signals if they were spurious. Even if you've removed obvious relatives (e.g., via KING cutoff), subtle structure can remain, especially in diverse cohorts.
How to test:
- Generate QQ plots and calculate genomic control lambda for both tools. High lambda (>1.1-1.2) in PLINK2 suggests inflation; if REGENIE's is ~1, that's your culprit.
  - In PLINK2: Your output already has p-values - use R to plot (e.g., qqman package) or online tools.
  - For REGENIE: Same, check its .regenie output.
- Run PCA on your genotypes (e.g., via PLINK2 or flashpca) and include top 10-20 PCs as covariates in PLINK2 to mimic REGENIE's correction.
  - Command example:
```
plink2 --bfile your_data --pca 20 --out pca_results
```
    Then add to your GWAS:
```
plink2 --bfile your_data --glm hide-covar --pheno your_pheno --covar pca_results.eigenvec --covar-name PC1-PC20 --covar-variance-standardize --geno 0.05 --hwe 1e-6 midp keep-fewhet --maf 0.01 --mind 0.1 --ci 0.95 --mach-r2-filter 0.8 2 --out plink_with_pcs
```
    If hits drop dramatically, structure was confounding.
- For cryptic relatedness: Recheck with --make-king in PLINK2 and plot relatedness vs. phenotype correlation. If clustered, REGENIE's over-correction makes sense.

2. Rule Out Noise or Technical Artifacts

Why this matters: Your continuous trait might have non-normal distribution, batch effects, or imputation issues leading to noisy signals in PLINK2.
How to test:
- Check phenotype normality: Histogram/boxplot in R. If skewed, try log/box-cox transformation and re-run both tools.
- Inspect imputation quality: Your --mach-r2-filter 0.8 2 is good, but stratify results by INFO score (add --info 0.8 if not already). Low-quality SNPs can inflate p-values diffusely.
- Batch effects: If samples from different arrays/centers, include batch as a covariate or check PCA for clustering.
- Compare subsets: Run GWAS on random halves of your cohort - if hits don't replicate internally, it's noise.

3. Assess Polygenic Signal with Subtle Effects

Why this matters: If truly polygenic (like height), you'd expect broad signals, but not a "sea" without any loci enriched. REGENIE might be conservative here.
How to test:
- Compute polygenic risk scores (PRS) from your PLINK2 hits (e.g., via PRSice or LDpred) and test correlation with phenotype. Low PRS R² suggests spurious hits.
- Heritability estimation: Use GCTA or LDSC on summary stats. If h²_SNP is high but no peaks, it's polygenic; if inflated in PLINK2 vs. REGENIE, confounding.
  - For LDSC: Download your sumstats and run (https://github.com/bulik/ldsc).
- Compare to known traits: If your measurement is similar to a polygenic one (e.g., BMI), check overlap with public GWAS (e.g., GWAS Catalog).

4. Tool-Specific Differences and Fixes

PLINK2 raw p-values aren't multiple-testing corrected (as noted in comments), but with 5e-200, that's not the issue - it's inflation. REGENIE's step 1 (null model) might be absorbing variance if your trait is highly heritable.
Try BOLT-LMM or SAIGE as alternatives - they handle structure like REGENIE but might give intermediate results.
Re-run REGENIE with looser parameters (e.g., --minMAC 10 if rare variants) or without sparse GRM to see if signals emerge.

If you share QQ/Manhattan plots or lambda values, that'd help narrow it down. Also, check the PLINK2/REGENIE docs for continuous traits: https://www.cog-genomics.org/plink/2.0/assoc and https://rgcgithub.github.io/regenie/. Others, feel free to chime in!

Kevin