Question

Plink2 Linear Regression for Binary phenotypes

1

Entering edit mode

3.6 years ago

anoops ▴ 40

Hello,

I am trying to run a GWAS with linear regression using Plink2. I call for linear however, the moment plink2 detects a binary phenotype it switches to logistic. Has any one else faced this issue? Is there a work around?

Plink2:

$plink2 --pfile $pfile \
        --pheno $pheno_file \
        --pheno-name $pheno_name \
        --keep $samples_list \
        --maf 0.01 \
        --linear cols=+a1freq,+machr2 \
        --vif 999 \
        --covar $pheno_file \
        --covar-name $covar_list \
        --covar-variance-standardize \
        --memory 19000 \
        --threads 10 \
        --adjust \
        --out "$outdir"

Log:
LINK v2.00a2LM 64-bit Intel (29 Apr 2019)     www.cog-genomics.org/plink/2.0/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /scratch/scratch3/op_genomic2/anoop/TAA_20200903/5_full_HARE_15pcs/3_assoc_glm_lin/out_plinkLIN1.log.
Options in effect:
  --adjust
  --covar pheno.tsv
  --covar-name AGE, GENDER, PC1, PC2, PC3, PC4, PC5
  --covar-variance-standardize
  --glm cols=+a1freq,+machr2
  --keep pheno.tsv 
  --maf 0.01
  --memory 19000
  --out plink_out
  --pfile chr10
  --pheno pheno.tsv
  --pheno-name Pheno
  --threads 10
  --vif 999

--
1 binary phenotype loaded 
--glm logistic regression on phenotype 'Pheno': 0%^H^H2%^H^H5%^H^H7%^H^H10%^H^H^H12%^H^H^H14%^H^H^H16%^H^H^H18%^H^H^H21%^H^H^H23%^H^H^H26%^H^H^H28%^H^H^H31%^H^H^H33%^H^H^H36%^H^H^H39%^H^H^H41%^H^H^H44%^H^H^H47%^H^H^H50%^H^H^H52%^H^H^H55%^H^H^H57%^H^H^H59%^H^H^H62%^H^H^H64%^H^H^H67%^H^H^H70%^H^H^H72%^H^H^H74%^H^H^H76%^H^H^H79%^H^H^H81%^H^H^H84%^H^H^H86%^H^H^H88%^H^H^H91%^H^H^H93%^H^H^H96%^H^H^H99%^H^H^Hdone.

Seems like a bug, but the manual is a little ambiguous so it might be by design. Any suggestions would be greatly appreciated.

Thanks!

SNP snp software error • 2.2k views

ADD COMMENT • link 3.6 years ago by anoops ▴ 40

0

Entering edit mode

Thanks chrchang523.

I understand this, I am however trying to analyze a large dataset compared my Computational resources and logistic regression is a lot more expensive.

I will try the hack, thanks!

ADD REPLY • link 3.6 years ago by anoops ▴ 40

0

Entering edit mode

If you don't have missing genotypes, the 'cc-residualize' modifier implemented in July 2020 is a better way to trade off a little accuracy for speed.

ADD REPLY • link 3.6 years ago by chrchang523 10k

score 1 · Answer 1 · 2020-09-15

This is by design, since logistic/Firth regression is almost always more appropriate for binary phenotypes. See https://en.wikipedia.org/wiki/Statistical_model_specification for some context.

You can override this by recoding the phenotype from {1, 2} to e.g. {2, 3}, but this is very strongly discouraged unless you have a specific reason why linear regression should be more accurate than logistic/Firth regression here.