Question: Plink2 Linear Regression for Binary phenotypes
gravatar for anoops
9 days ago by
anoops30 wrote:


I am trying to run a GWAS with linear regression using Plink2. I call for linear however, the moment plink2 detects a binary phenotype it switches to logistic. Has any one else faced this issue? Is there a work around?


$plink2 --pfile $pfile \
        --pheno $pheno_file \
        --pheno-name $pheno_name \
        --keep $samples_list \
        --maf 0.01 \
        --linear cols=+a1freq,+machr2 \
        --vif 999 \
        --covar $pheno_file \
        --covar-name $covar_list \
        --covar-variance-standardize \
        --memory 19000 \
        --threads 10 \
        --adjust \
        --out "$outdir"

LINK v2.00a2LM 64-bit Intel (29 Apr 2019)
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /scratch/scratch3/op_genomic2/anoop/TAA_20200903/5_full_HARE_15pcs/3_assoc_glm_lin/out_plinkLIN1.log.
Options in effect:
  --covar pheno.tsv
  --covar-name AGE, GENDER, PC1, PC2, PC3, PC4, PC5
  --glm cols=+a1freq,+machr2
  --keep pheno.tsv 
  --maf 0.01
  --memory 19000
  --out plink_out
  --pfile chr10
  --pheno pheno.tsv
  --pheno-name Pheno
  --threads 10
  --vif 999

1 binary phenotype loaded 
--glm logistic regression on phenotype 'Pheno': 0%^H^H2%^H^H5%^H^H7%^H^H10%^H^H^H12%^H^H^H14%^H^H^H16%^H^H^H18%^H^H^H21%^H^H^H23%^H^H^H26%^H^H^H28%^H^H^H31%^H^H^H33%^H^H^H36%^H^H^H39%^H^H^H41%^H^H^H44%^H^H^H47%^H^H^H50%^H^H^H52%^H^H^H55%^H^H^H57%^H^H^H59%^H^H^H62%^H^H^H64%^H^H^H67%^H^H^H70%^H^H^H72%^H^H^H74%^H^H^H76%^H^H^H79%^H^H^H81%^H^H^H84%^H^H^H86%^H^H^H88%^H^H^H91%^H^H^H93%^H^H^H96%^H^H^H99%^H^H^Hdone.

Seems like a bug, but the manual is a little ambiguous so it might be by design. Any suggestions would be greatly appreciated.


snp software error • 67 views
ADD COMMENTlink modified 8 days ago • written 9 days ago by anoops30

Thanks chrchang523.

I understand this, I am however trying to analyze a large dataset compared my Computational resources and logistic regression is a lot more expensive.

I will try the hack, thanks!

ADD REPLYlink written 8 days ago by anoops30

If you don't have missing genotypes, the 'cc-residualize' modifier implemented in July 2020 is a better way to trade off a little accuracy for speed.

ADD REPLYlink written 8 days ago by chrchang5237.3k
gravatar for chrchang523
8 days ago by
United States
chrchang5237.3k wrote:

This is by design, since logistic/Firth regression is almost always more appropriate for binary phenotypes. See for some context.

You can override this by recoding the phenotype from {1, 2} to e.g. {2, 3}, but this is very strongly discouraged unless you have a specific reason why linear regression should be more accurate than logistic/Firth regression here.

ADD COMMENTlink written 8 days ago by chrchang5237.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1201 users visited in the last hour