Question: PLINK logistic regression analysis and covariates
gravatar for alhamidi.reem
8 months ago by
alhamidi.reem20 wrote:

Hi everyone,

I'm struggling with PLINK and the results being generated by PLINK. I'm not sure which one to look at and consider any statistical significance.

I'm using a SNP data from a control/case study. I firstly did the basic association analysis using the following command:

plink --file xxx --assoc --ci 0.95 --out newfile

which generated a file with p values for each SNPs including an OR and L/U95 etc.

I then wanted to use the logistic regression command, which I have a covariate file that consists of 3 columns (FID, IID, and a covariate which i will call (CN), so I wanted to see if the covariate has an effect on phenotype or the outcome:

plink --file xx --logistic --covar covariate.txt --covarname CN --ci 0.95 --out newfile

which gave me the following results:

CHR SNP    BP          A1 TEST NMISS OR         STAT      P
8       SNP1 6962046 G   ADD   1058    0.646      -3.607     0.00031
8       SNP1 6962046 G   CN      1058    0.9289   -1.9         0.05737

I'd really appreciate it if someone explained to me what the results mean? also, the SNP1 from the assoc analysis showed with a P-value of (2.95E-06), I don't know if that would help.

I just want know what kind of command to use to generate an output knowing the effect of my covariate to the phenotype and how to interpret it. I have also used the --genotypic command in the logistic regression analysis but it has given me different p values and OR. which test do I need to use?

Using the --genotypic flag with --logistic gererated an output that looks like this:

CHR SNP   BP          A1  TEST           NMISS  OR       SE           L95       U95      STAT     P
    8   SNP1 6962046 G   ADD             1058     0.6935  0.1951    0.4731  1.016    -1.876    0.06063
    8   SNP1 6962046 G   DOMDEV     1058     0.8997  0.2302    0.573    1.413    -0.4593  0.646
    8   SNP1 6962046 G   CN                1058     0.93      0.03887  0.8618  1.004    -1.868   0.06183
    8   SNP1 6962046 G   GENO_2DF  1058     NA       NA           NA        NA        13.32    0.001284
snp plink association • 1.2k views
ADD COMMENTlink modified 8 months ago by Kevin Blighe63k • written 8 months ago by alhamidi.reem20

The file has a header line. Why are you ignoring it?

Every time you run plink, it prints a URL pointing to online documentation. Why are you ignoring that, too?

ADD REPLYlink written 8 months ago by chrchang5237.1k

Thank you for pointing that out @chrchang523. I appreciate your time and effort in helping me out :). But to answer your questions, I have definitely not ignored any of the things you mentioned because they are all part of the analysis and understanding. But I find PLINK difficult to comprehend 100%, after six months of on and off with PLINK, i thought I'd try and seek help. I don't think this platform should be used to put people down. We are not trying to find an easy way out, we are trying to understand.

with best wishes,


ADD REPLYlink written 8 months ago by alhamidi.reem20

I stand by the harsh tone of my comment, because you did not include the header line in your question. There is simply no excuse for that when you are asking what the columns mean.

(edit: ok, I see that you have edited in the column headers. Now we can get somewhere.)

ADD REPLYlink modified 8 months ago • written 8 months ago by chrchang5237.1k
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

Hey, in the first example that you give:

CHR     SNP  BP      A1  TEST  NMISS OR     STAT    P
8       SNP1 6962046 G   ADD   1058  0.646  -3.607  0.00031
8       SNP1 6962046 G   CN    1058  0.9289 -1.9    0.05737

Here, the p-value for the SNP, after controlling for the covariate, 'CN', is given on the first row (ADD). On the second row is just the p-value for the covariate against the phenotype under study.


In your second, '--genotypic', example, one has to understand that there are different ways of conducting analyses using genetic variant data:

  • additive models (ADD) - most basic and are simply based on allelic tallies / dosage, with individuals having 0, 1, or 2 disease alleles. Usually, it is the minor allele that is being counted and tested (i.e., 0, 1, or 2 minor alleles)
  • genotypic models (GENO) - more based on the fact that we can have 3 genotypes at any position: AA | AB | BB. Further, one can add in extra assumptions about dominance (AA) and recessiveness (aa) to these models.

If you are unsure about the second part with --genotypic, then just use the more simple case as in your first example.


ADD COMMENTlink written 8 months ago by Kevin Blighe63k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 699 users visited in the last hour