Question: Plink2 Linear Regression for Binary phenotypes
1
gravatar for anoops
9 days ago by
anoops30
anoops30 wrote:

Hello,

I am trying to run a GWAS with linear regression using Plink2. I call for linear however, the moment plink2 detects a binary phenotype it switches to logistic. Has any one else faced this issue? Is there a work around?

Plink2:

$plink2 --pfile $pfile \
        --pheno $pheno_file \
        --pheno-name $pheno_name \
        --keep $samples_list \
        --maf 0.01 \
        --linear cols=+a1freq,+machr2 \
        --vif 999 \
        --covar $pheno_file \
        --covar-name $covar_list \
        --covar-variance-standardize \
        --memory 19000 \
        --threads 10 \
        --adjust \
        --out "$outdir"

Log:
LINK v2.00a2LM 64-bit Intel (29 Apr 2019)     www.cog-genomics.org/plink/2.0/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /scratch/scratch3/op_genomic2/anoop/TAA_20200903/5_full_HARE_15pcs/3_assoc_glm_lin/out_plinkLIN1.log.
Options in effect:
  --adjust
  --covar pheno.tsv
  --covar-name AGE, GENDER, PC1, PC2, PC3, PC4, PC5
  --covar-variance-standardize
  --glm cols=+a1freq,+machr2
  --keep pheno.tsv 
  --maf 0.01
  --memory 19000
  --out plink_out
  --pfile chr10
  --pheno pheno.tsv
  --pheno-name Pheno
  --threads 10
  --vif 999

--
1 binary phenotype loaded 
--glm logistic regression on phenotype 'Pheno': 0%^H^H2%^H^H5%^H^H7%^H^H10%^H^H^H12%^H^H^H14%^H^H^H16%^H^H^H18%^H^H^H21%^H^H^H23%^H^H^H26%^H^H^H28%^H^H^H31%^H^H^H33%^H^H^H36%^H^H^H39%^H^H^H41%^H^H^H44%^H^H^H47%^H^H^H50%^H^H^H52%^H^H^H55%^H^H^H57%^H^H^H59%^H^H^H62%^H^H^H64%^H^H^H67%^H^H^H70%^H^H^H72%^H^H^H74%^H^H^H76%^H^H^H79%^H^H^H81%^H^H^H84%^H^H^H86%^H^H^H88%^H^H^H91%^H^H^H93%^H^H^H96%^H^H^H99%^H^H^Hdone.

Seems like a bug, but the manual is a little ambiguous so it might be by design. Any suggestions would be greatly appreciated.

Thanks!

snp software error • 67 views
ADD COMMENTlink modified 8 days ago • written 9 days ago by anoops30

Thanks chrchang523.

I understand this, I am however trying to analyze a large dataset compared my Computational resources and logistic regression is a lot more expensive.

I will try the hack, thanks!

ADD REPLYlink written 8 days ago by anoops30

If you don't have missing genotypes, the 'cc-residualize' modifier implemented in July 2020 is a better way to trade off a little accuracy for speed.

ADD REPLYlink written 8 days ago by chrchang5237.3k
1
gravatar for chrchang523
8 days ago by
chrchang5237.3k
United States
chrchang5237.3k wrote:

This is by design, since logistic/Firth regression is almost always more appropriate for binary phenotypes. See https://en.wikipedia.org/wiki/Statistical_model_specification for some context.

You can override this by recoding the phenotype from {1, 2} to e.g. {2, 3}, but this is very strongly discouraged unless you have a specific reason why linear regression should be more accurate than logistic/Firth regression here.

ADD COMMENTlink written 8 days ago by chrchang5237.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1201 users visited in the last hour