Association test in Plink
1
1
Entering edit mode
4.6 years ago
biogirl ▴ 210

Hi all,

It's been a while since I've used plink, and either I've lost the knack in my old(er) age or somethings changed (or a combo of both). I'm using plink version 1.90 (what's installed on high performance computer here) and I'm just banging my head against the wall, so hoping someone can point out what I've overlooked.

I have 218 isolates, 88 cases, 130 controls (case=2, control=1 in column six of .ped file). Here's an example of the .ped file:

Ind1 Ind1 0 0 0 2 A A 0 0 0 0 0 T T G....
Ind2 Ind2 0 0 0 2 A A 0 0 0 0 0 G G C....
Ind3 Ind3 0 0 0 1 0 0 0 0 0 0 0 T T G.....
And so on.

I ran the following command:

plink --file GWAS --allow-extra-chr --assoc --allow-no-sex

And the resulting .assoc file looks like this:

CHR  SNP         BP   A1      F_A      F_U   A2        CHISQ            P           OR 
26    .       6996    0        0        0    1           NA           NA           NA 
26    .       7316    0       NA        0    1           NA           NA           NA 
26    .       7321    0       NA        0    1           NA           NA           NA 
26    .       7392    0       NA        0    1           NA           NA           NA

As you can see, it's not calculating any p-values.

Can anyone provide me with clues as to what I'm missing? I tried converting all my data to binary (i.e. 0 for ref, 1 for a SNP) to eliminate the possibility of anything other than biallelic sites, but got the same result. Any help is appreciated.

plink GWAS association • 2.7k views
ADD COMMENT
0
Entering edit mode

Check you have a phenotype - case control status, and check frequency of SNPs --freq ?

ADD REPLY
0
Entering edit mode

Hiya,

Phenotypes seem ok:

Among remaining phenotypes, 88 are cases and 130 are controls

I think the clue may be in the how the SNP are in the .ped file. Output was 'Total genotyping rate is 0.188953' but the resulting .frq file was perhaps a bit weird? It looks a little like this:

CHR SNP A1 A2 MAF NCHROBS

26 . 0 A 0 186

26 . 0 C 0 2

26 . 0 T 0 8

26 . 0 A 0 4

26 . 0 G 0 6

26 . 0 T 0 8

26 . 0 G 0 8

26 . 0 T 0 8

26 . 0 G 0 4

26 . 0 T 0 2

26 . 0 A 0 2

26 . 0 T 0 2

26 . 0 T 0 28

I would expect A1 to be a SNP, and there to be something in MAF, right?

ADD REPLY
0
Entering edit mode

I'd filter by missingness, usually at 95%, then you wouldn't have any data left.

And the MAF is 0 for all (in this example subset). These would have to be filtered out in QC step: monomorphic SNPs

ADD REPLY
0
Entering edit mode

Ok I think I know what the problem is - when converting from vcf to .ped, the SNPs are kept as nucleotides (A,C,G,T) but the reference is left as '0'. So I have a mixture of 0 and nucleotides in the .ped file.

Any idea on how to convert from vcf to bed and retain the reference allele, instead of getting '0'?

ADD REPLY
0
Entering edit mode

plink —vcf should work.

ADD REPLY
0
Entering edit mode

It doesn't retain the reference - it just puts that as '0' and keeps the SNPs as nucleotides. This is how I've been making my .ped files

ADD REPLY
0
Entering edit mode

Can you post the .log file from your "plink --vcf" run, and the first few nonheader lines from your VCF?

ADD REPLY
0
Entering edit mode

Sure! Here's the .log output:

   Options in effect:
  --allow-extra-chr
  --allow-no-sex
  --assoc
  --file GWAS

Random number seed: 1567672357
193423 MB RAM detected; reserving 96711 MB for main workspace.
Scanning .ped file... done.
Performing single-pass .bed write (291108 variants, 218 people).
--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.
291108 variants loaded from .bim file.
218 people (0 males, 0 females, 218 ambiguous) loaded from .fam.
Ambiguous sex IDs written to plink.nosex .
218 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 218 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.188953.
291108 variants and 218 people pass filters and QC.
Among remaining phenotypes, 88 are cases and 130 are controls.
Writing C/C --assoc report to plink.assoc ... done.`

And here's the first non-header line from the VCF (sorry, there's 218 isolates in here so its a lot):

MT      7316    .       T       C       11594.00        PASS    ABHom=0.987;AC=1;AF=1.00;AN=1;BaseQRankSum=3.224;ClippingRankSum=0.000;DP=449;FS=10.871;MLEAC=1;MLEAF=1.00;MQ=52.18;MQRankSum=-1.591;OND=0.013;QD=25.88;ReadPosRankSum=1.865;SF=145;SOR=1.877   GT:PL:AD:GQ:DP  .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       1:.,.,.:6,442:99:448    .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .
ADD REPLY
2
Entering edit mode
4.6 years ago

Use --make-bed instead of --recode to keep both allele codes when only one of them is actually present.

ADD COMMENT
0
Entering edit mode

Cool, thanks for that - that has solved the issue of replacing the '0' with the reference allele - so I'm actually getting nucleotides for A1 and A2 now.

But I'm still not getting values for CHISQ, P or OR - they are still all NA when I try to test for association:

plink --bfile pleasework --allow-extra-chr --allow-no-sex --assoc

(yes, I named my files 'pleasework'!)

ADD REPLY
0
Entering edit mode

That’s because only one of the two alleles is actually present for those variants. You can only get association results when the minor allele frequency (MAF) is greater than zero.

The missingness rate in your dataset is extremely high. One possibility is that something went wrong in how your VCF was generated, replacing all copies of the second allele with missing genotypes; this seems likely if ALL your variants have MAF 0.

ADD REPLY
0
Entering edit mode

Oh wow, ok! Thanks for that. I realise this is bordering on a new topic now (and perhaps I should make a new thread?) but any ideas on how to fix the vcf? I merged all my vcfs into one using bcftools:

bcftools merge --force-samples *vcf.gz -o all_isolates.vcf

Where all the individuals vcfs had been compressed and indexed using bgzip and tabix

ADD REPLY
0
Entering edit mode

Please post a new question.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6