Question

What is the state of the art for GWAS in terms of statistical algorithm for either Case/control and Quantitative traits?

2

Entering edit mode

4.7 years ago

b.ambrozio ▴ 30

Hello! I'm trying to understand what is the best algorithm for GWAS nowadays. I know we have many tools available like Plink and Hail, but currently, what is the best algorithm if I won't use any them? Let's say, write down a script in R or Python from scratch. Which statistical algorithm should I use? Is it linear mixed models (LMMs)? I'm confused as we can have binary phenotypes (case/control) or quantitative phenotypes. LMM seems to address quantitative ones, but can it be used for case/control as well? Actually, what is the state of the art for both/each of them? Pair-reviewed papers as references will be appreciated. Thanks!

GWAS LMM • 2.3k views

ADD COMMENT • link updated 4.7 years ago by chrchang523 10k • written 4.7 years ago by b.ambrozio ▴ 30

0

Entering edit mode

Actually, what is the state of the art for both/each of them?

That would be Plink.

ADD REPLY • link 4.7 years ago by WouterDeCoster 47k

score 6 · Accepted Answer · 2019-12-02

The main regression executed by Plink was introduced by EIGENSTRAT in ~2006; see https://www.nature.com/articles/ng1847 . This is actually straightforward to write in R/Python from scratch; the harder part is optimizing the implementation for large datasets.

The Firth regression added to Plink 2.0 to improve handling of rare variants and imbalanced binary phenotypes was motivated by https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4049324/ .

Mixed linear models provide better statistical power when you have lots of close relatives in your dataset, but are much trickier to solve; actually, this is still a significant research area. Two tools covering parts of the current state-of-the-art are SAIGE (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6119127/ ; handles imbalanced binary phenotypes, but relatively slow) and fastGWA (https://www.nature.com/articles/s41588-019-0530-8 ; great speed, but doesn't support dosage data yet and uses a misspecified model for binary phenotypes).