Question: Association test in Plink
1
gravatar for biogirl
5 months ago by
biogirl190
European Union
biogirl190 wrote:

Hi all,

It's been a while since I've used plink, and either I've lost the knack in my old(er) age or somethings changed (or a combo of both). I'm using plink version 1.90 (what's installed on high performance computer here) and I'm just banging my head against the wall, so hoping someone can point out what I've overlooked.

I have 218 isolates, 88 cases, 130 controls (case=2, control=1 in column six of .ped file). Here's an example of the .ped file:

Ind1 Ind1 0 0 0 2 A A 0 0 0 0 0 T T G....
Ind2 Ind2 0 0 0 2 A A 0 0 0 0 0 G G C....
Ind3 Ind3 0 0 0 1 0 0 0 0 0 0 0 T T G.....
And so on.

I ran the following command:

plink --file GWAS --allow-extra-chr --assoc --allow-no-sex

And the resulting .assoc file looks like this:

CHR  SNP         BP   A1      F_A      F_U   A2        CHISQ            P           OR 
26    .       6996    0        0        0    1           NA           NA           NA 
26    .       7316    0       NA        0    1           NA           NA           NA 
26    .       7321    0       NA        0    1           NA           NA           NA 
26    .       7392    0       NA        0    1           NA           NA           NA

As you can see, it's not calculating any p-values.

Can anyone provide me with clues as to what I'm missing? I tried converting all my data to binary (i.e. 0 for ref, 1 for a SNP) to eliminate the possibility of anything other than biallelic sites, but got the same result. Any help is appreciated.

plink association gwas • 301 views
ADD COMMENTlink modified 5 months ago by zx87549.0k • written 5 months ago by biogirl190

Check you have a phenotype - case control status, and check frequency of SNPs --freq ?

ADD REPLYlink written 5 months ago by zx87549.0k

Hiya,

Phenotypes seem ok:

Among remaining phenotypes, 88 are cases and 130 are controls

I think the clue may be in the how the SNP are in the .ped file. Output was 'Total genotyping rate is 0.188953' but the resulting .frq file was perhaps a bit weird? It looks a little like this:

CHR SNP A1 A2 MAF NCHROBS

26 . 0 A 0 186

26 . 0 C 0 2

26 . 0 T 0 8

26 . 0 A 0 4

26 . 0 G 0 6

26 . 0 T 0 8

26 . 0 G 0 8

26 . 0 T 0 8

26 . 0 G 0 4

26 . 0 T 0 2

26 . 0 A 0 2

26 . 0 T 0 2

26 . 0 T 0 28

I would expect A1 to be a SNP, and there to be something in MAF, right?

ADD REPLYlink modified 5 months ago • written 5 months ago by biogirl190

I'd filter by missingness, usually at 95%, then you wouldn't have any data left.

And the MAF is 0 for all (in this example subset). These would have to be filtered out in QC step: monomorphic SNPs

ADD REPLYlink written 5 months ago by zx87549.0k

Ok I think I know what the problem is - when converting from vcf to .ped, the SNPs are kept as nucleotides (A,C,G,T) but the reference is left as '0'. So I have a mixture of 0 and nucleotides in the .ped file.

Any idea on how to convert from vcf to bed and retain the reference allele, instead of getting '0'?

ADD REPLYlink written 5 months ago by biogirl190

plink —vcf should work.

ADD REPLYlink written 5 months ago by chrchang5236.5k

It doesn't retain the reference - it just puts that as '0' and keeps the SNPs as nucleotides. This is how I've been making my .ped files

ADD REPLYlink written 5 months ago by biogirl190

Can you post the .log file from your "plink --vcf" run, and the first few nonheader lines from your VCF?

ADD REPLYlink written 5 months ago by chrchang5236.5k

Sure! Here's the .log output:

   Options in effect:
  --allow-extra-chr
  --allow-no-sex
  --assoc
  --file GWAS

Random number seed: 1567672357
193423 MB RAM detected; reserving 96711 MB for main workspace.
Scanning .ped file... done.
Performing single-pass .bed write (291108 variants, 218 people).
--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.
291108 variants loaded from .bim file.
218 people (0 males, 0 females, 218 ambiguous) loaded from .fam.
Ambiguous sex IDs written to plink.nosex .
218 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 218 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.188953.
291108 variants and 218 people pass filters and QC.
Among remaining phenotypes, 88 are cases and 130 are controls.
Writing C/C --assoc report to plink.assoc ... done.`

And here's the first non-header line from the VCF (sorry, there's 218 isolates in here so its a lot):

MT      7316    .       T       C       11594.00        PASS    ABHom=0.987;AC=1;AF=1.00;AN=1;BaseQRankSum=3.224;ClippingRankSum=0.000;DP=449;FS=10.871;MLEAC=1;MLEAF=1.00;MQ=52.18;MQRankSum=-1.591;OND=0.013;QD=25.88;ReadPosRankSum=1.865;SF=145;SOR=1.877   GT:PL:AD:GQ:DP  .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       1:.,.,.:6,442:99:448    .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .
ADD REPLYlink modified 5 months ago by zx87549.0k • written 5 months ago by biogirl190
2
gravatar for chrchang523
5 months ago by
chrchang5236.5k
United States
chrchang5236.5k wrote:

Use --make-bed instead of --recode to keep both allele codes when only one of them is actually present.

ADD COMMENTlink modified 5 months ago by zx87549.0k • written 5 months ago by chrchang5236.5k

Cool, thanks for that - that has solved the issue of replacing the '0' with the reference allele - so I'm actually getting nucleotides for A1 and A2 now.

But I'm still not getting values for CHISQ, P or OR - they are still all NA when I try to test for association:

plink --bfile pleasework --allow-extra-chr --allow-no-sex --assoc

(yes, I named my files 'pleasework'!)

ADD REPLYlink written 5 months ago by biogirl190

That’s because only one of the two alleles is actually present for those variants. You can only get association results when the minor allele frequency (MAF) is greater than zero.

The missingness rate in your dataset is extremely high. One possibility is that something went wrong in how your VCF was generated, replacing all copies of the second allele with missing genotypes; this seems likely if ALL your variants have MAF 0.

ADD REPLYlink written 5 months ago by chrchang5236.5k

Oh wow, ok! Thanks for that. I realise this is bordering on a new topic now (and perhaps I should make a new thread?) but any ideas on how to fix the vcf? I merged all my vcfs into one using bcftools:

bcftools merge --force-samples *vcf.gz -o all_isolates.vcf

Where all the individuals vcfs had been compressed and indexed using bgzip and tabix

ADD REPLYlink written 5 months ago by biogirl190

Please post a new question.

ADD REPLYlink written 5 months ago by zx87549.0k

Done

bcftools merge creates high missingness rate

ADD REPLYlink written 5 months ago by biogirl190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 791 users visited in the last hour