Plink2 Linear Regression for Binary phenotypes
1
1
Entering edit mode
3.6 years ago
anoops ▴ 40

Hello,

I am trying to run a GWAS with linear regression using Plink2. I call for linear however, the moment plink2 detects a binary phenotype it switches to logistic. Has any one else faced this issue? Is there a work around?

Plink2:

$plink2 --pfile $pfile \
        --pheno $pheno_file \
        --pheno-name $pheno_name \
        --keep $samples_list \
        --maf 0.01 \
        --linear cols=+a1freq,+machr2 \
        --vif 999 \
        --covar $pheno_file \
        --covar-name $covar_list \
        --covar-variance-standardize \
        --memory 19000 \
        --threads 10 \
        --adjust \
        --out "$outdir"

Log:
LINK v2.00a2LM 64-bit Intel (29 Apr 2019)     www.cog-genomics.org/plink/2.0/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /scratch/scratch3/op_genomic2/anoop/TAA_20200903/5_full_HARE_15pcs/3_assoc_glm_lin/out_plinkLIN1.log.
Options in effect:
  --adjust
  --covar pheno.tsv
  --covar-name AGE, GENDER, PC1, PC2, PC3, PC4, PC5
  --covar-variance-standardize
  --glm cols=+a1freq,+machr2
  --keep pheno.tsv 
  --maf 0.01
  --memory 19000
  --out plink_out
  --pfile chr10
  --pheno pheno.tsv
  --pheno-name Pheno
  --threads 10
  --vif 999

--
1 binary phenotype loaded 
--glm logistic regression on phenotype 'Pheno': 0%^H^H2%^H^H5%^H^H7%^H^H10%^H^H^H12%^H^H^H14%^H^H^H16%^H^H^H18%^H^H^H21%^H^H^H23%^H^H^H26%^H^H^H28%^H^H^H31%^H^H^H33%^H^H^H36%^H^H^H39%^H^H^H41%^H^H^H44%^H^H^H47%^H^H^H50%^H^H^H52%^H^H^H55%^H^H^H57%^H^H^H59%^H^H^H62%^H^H^H64%^H^H^H67%^H^H^H70%^H^H^H72%^H^H^H74%^H^H^H76%^H^H^H79%^H^H^H81%^H^H^H84%^H^H^H86%^H^H^H88%^H^H^H91%^H^H^H93%^H^H^H96%^H^H^H99%^H^H^Hdone.

Seems like a bug, but the manual is a little ambiguous so it might be by design. Any suggestions would be greatly appreciated.

Thanks!

SNP snp software error • 2.2k views
ADD COMMENT
0
Entering edit mode

Thanks chrchang523.

I understand this, I am however trying to analyze a large dataset compared my Computational resources and logistic regression is a lot more expensive.

I will try the hack, thanks!

ADD REPLY
0
Entering edit mode

If you don't have missing genotypes, the 'cc-residualize' modifier implemented in July 2020 is a better way to trade off a little accuracy for speed.

ADD REPLY
1
Entering edit mode
3.6 years ago

This is by design, since logistic/Firth regression is almost always more appropriate for binary phenotypes. See https://en.wikipedia.org/wiki/Statistical_model_specification for some context.

You can override this by recoding the phenotype from {1, 2} to e.g. {2, 3}, but this is very strongly discouraged unless you have a specific reason why linear regression should be more accurate than logistic/Firth regression here.

ADD COMMENT

Login before adding your answer.

Traffic: 1764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6