Question

Why may BOLT-LMM and SAIGE (quantitative, linear-mixed model) yield different results when ran on the absolutely the same dataset?

5

Entering edit mode

3.3 years ago

futurolog ▴ 50

As a validation experiment, I have run the same GWAS of a quantitative phenotype derived from the UKBiobank, alongside the genomic data from the UKBiobank, once using the program BOLT-LMM and once using SAIGE linear mixed model (with selected quantitative trait tag). I wanted to see if the results would be comparable.

I however encountered consistantly lower p-values of the SAIGE output summary statistics than of the corresponding BOLT ones. In particular, I had many snp loci along the genome that were significant (with 10^(-8) level of significance) according to BOLT but were not significant (relative to the same level) according to SAIGE.

My question is, why might there be such discrepancy (e.g. what have I done wrong) and what is the proper way to set up a liner mixed model GWAS run in SAIGE (or alternatively how to set up its BOLT counterpart) in order for the results of both BOLT and SAIGE to be comparable?

I have included the input tags that I have used in my runs with BOLT and SAIGE (Step1 and Step2), in case this is useful. The (bed, bim, fam), (bgen, bgen-index, sample) as well as (phenotype and covariates table, phenotype and covariate columns) used in both scripts are the same.

BOLT:

/BOLT-LMM_v2.3.4/bolt 
--bed=  --bim=  --fam=  --remove= 
--phenoFile=  --phenoCol=  --covarFile=  --qCovarCol=  --qCovarCol=  --covarCol= \
--LDscoresMatchBp --maxMissingPerIndiv 1 --lmm \
--LDscoresFile=LDSCORE.1000G_EUR.tab.gz --geneticMapFile=genetic_map_hg19.txt.gz \
--numThreads=32 --bgenFile=1.bgen --sampleFile=1.sample \
--statsFile=   --statsFileBgenSnps=

SAIGE Step 1:

/SAIGE/SAIGE-0.35.8.3/extdata/step1_fitNULLGLMM.R \
--plinkFile=  --phenoFile=  --sampleIDColinphenoFile= --phenoCol=  \
--traitType=quantitative --invNormalize=TRUE \
--covarColList=  --outputPrefix=  --nThreads=32 --LOCO=FALSE --tauInit=1,0

SAIGE Step 2:

/SAIGE/SAIGE-0.35.8.3/extdata/step2_SPAtests.R --minMAF=  --minMAC=  \ 
--bgenFile=  --bgenFileIndex=  --sampleFile= \
--GMMATmodelFile=  --varianceRatioFile= \
--SAIGEOutputFile=  --numLinesOutput=2 --IsDropMissingDosages=FALSE --LOCO=FALSE

BOLT-LMM SAIGE GWAS replication of results • 2.7k views

ADD COMMENT • link updated 2.6 years ago by yiorkala ▴ 10 • written 3.3 years ago by futurolog ▴ 50

score 1 · Answer 1 · 2021-09-16

Hi,

When you mention BOLT-LMM are you referring to the non-infinitesimal version (also known as mixture of gaussians)? Because BOLT will report 2 sets of test-statistics and pvalues when using the "-lmm" flag, "P_BOLT_LMM" and "P_BOLT_LMM_INF". If that's the case, then the latter should be similar to SAIGE (in terms of model). BOLT-LMM is expected to obtain higher power than the other two due to better modelling the effect sizes with Bayesian priors.

Another major difference, judging from the way you invoke SAIGE, is the lack of LOCO (=leaving one chromosome out). BOLT uses LOCO by default as this is well known to increase power in GWAS, so both versions of that would achive higher power than SAIGE without LOCO.

Hope these 2 points make sense :)