1
1
Entering edit mode
2.7 years ago
biogirl ▴ 200

Hi all,

It's been a while since I've used plink, and either I've lost the knack in my old(er) age or somethings changed (or a combo of both). I'm using plink version 1.90 (what's installed on high performance computer here) and I'm just banging my head against the wall, so hoping someone can point out what I've overlooked.

I have 218 isolates, 88 cases, 130 controls (case=2, control=1 in column six of .ped file). Here's an example of the .ped file:

Ind1 Ind1 0 0 0 2 A A 0 0 0 0 0 T T G....
Ind2 Ind2 0 0 0 2 A A 0 0 0 0 0 G G C....
Ind3 Ind3 0 0 0 1 0 0 0 0 0 0 0 T T G.....
And so on.


I ran the following command:

plink --file GWAS --allow-extra-chr --assoc --allow-no-sex


And the resulting .assoc file looks like this:

CHR  SNP         BP   A1      F_A      F_U   A2        CHISQ            P           OR
26    .       6996    0        0        0    1           NA           NA           NA
26    .       7316    0       NA        0    1           NA           NA           NA
26    .       7321    0       NA        0    1           NA           NA           NA
26    .       7392    0       NA        0    1           NA           NA           NA


As you can see, it's not calculating any p-values.

Can anyone provide me with clues as to what I'm missing? I tried converting all my data to binary (i.e. 0 for ref, 1 for a SNP) to eliminate the possibility of anything other than biallelic sites, but got the same result. Any help is appreciated.

plink GWAS association • 1.7k views
0
Entering edit mode

Check you have a phenotype - case control status, and check frequency of SNPs --freq ?

0
Entering edit mode

Hiya,

Phenotypes seem ok:

Among remaining phenotypes, 88 are cases and 130 are controls


I think the clue may be in the how the SNP are in the .ped file. Output was 'Total genotyping rate is 0.188953' but the resulting .frq file was perhaps a bit weird? It looks a little like this:

CHR SNP A1 A2 MAF NCHROBS

26 . 0 A 0 186

26 . 0 C 0 2

26 . 0 T 0 8

26 . 0 A 0 4

26 . 0 G 0 6

26 . 0 T 0 8

26 . 0 G 0 8

26 . 0 T 0 8

26 . 0 G 0 4

26 . 0 T 0 2

26 . 0 A 0 2

26 . 0 T 0 2

26 . 0 T 0 28

I would expect A1 to be a SNP, and there to be something in MAF, right?

0
Entering edit mode

I'd filter by missingness, usually at 95%, then you wouldn't have any data left.

And the MAF is 0 for all (in this example subset). These would have to be filtered out in QC step: monomorphic SNPs

0
Entering edit mode

Ok I think I know what the problem is - when converting from vcf to .ped, the SNPs are kept as nucleotides (A,C,G,T) but the reference is left as '0'. So I have a mixture of 0 and nucleotides in the .ped file.

Any idea on how to convert from vcf to bed and retain the reference allele, instead of getting '0'?

0
Entering edit mode

0
Entering edit mode

It doesn't retain the reference - it just puts that as '0' and keeps the SNPs as nucleotides. This is how I've been making my .ped files

0
Entering edit mode

0
Entering edit mode

Sure! Here's the .log output:

   Options in effect:
--allow-extra-chr
--allow-no-sex
--assoc
--file GWAS

Random number seed: 1567672357
193423 MB RAM detected; reserving 96711 MB for main workspace.
Scanning .ped file... done.
Performing single-pass .bed write (291108 variants, 218 people).
written.
291108 variants loaded from .bim file.
218 people (0 males, 0 females, 218 ambiguous) loaded from .fam.
Ambiguous sex IDs written to plink.nosex .
218 phenotype values loaded from .fam.
Before main variant filters, 218 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.188953.
291108 variants and 218 people pass filters and QC.
Among remaining phenotypes, 88 are cases and 130 are controls.
Writing C/C --assoc report to plink.assoc ... done.


And here's the first non-header line from the VCF (sorry, there's 218 isolates in here so its a lot):

MT      7316    .       T       C       11594.00        PASS    ABHom=0.987;AC=1;AF=1.00;AN=1;BaseQRankSum=3.224;ClippingRankSum=0.000;DP=449;FS=10.871;MLEAC=1;MLEAF=1.00;MQ=52.18;MQRankSum=-1.591;OND=0.013;QD=25.88;ReadPosRankSum=1.865;SF=145;SOR=1.877   GT:PL:AD:GQ:DP  .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       1:.,.,.:6,442:99:448    .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .

2
Entering edit mode
2.7 years ago

Use --make-bed instead of --recode to keep both allele codes when only one of them is actually present.

0
Entering edit mode

Cool, thanks for that - that has solved the issue of replacing the '0' with the reference allele - so I'm actually getting nucleotides for A1 and A2 now.

But I'm still not getting values for CHISQ, P or OR - they are still all NA when I try to test for association:

plink --bfile pleasework --allow-extra-chr --allow-no-sex --assoc


(yes, I named my files 'pleasework'!)

0
Entering edit mode

That’s because only one of the two alleles is actually present for those variants. You can only get association results when the minor allele frequency (MAF) is greater than zero.

The missingness rate in your dataset is extremely high. One possibility is that something went wrong in how your VCF was generated, replacing all copies of the second allele with missing genotypes; this seems likely if ALL your variants have MAF 0.

0
Entering edit mode

Oh wow, ok! Thanks for that. I realise this is bordering on a new topic now (and perhaps I should make a new thread?) but any ideas on how to fix the vcf? I merged all my vcfs into one using bcftools:

bcftools merge --force-samples *vcf.gz -o all_isolates.vcf
`

Where all the individuals vcfs had been compressed and indexed using bgzip and tabix

0
Entering edit mode

0
Entering edit mode