Question: PLINK returns NA values for all minor allele frequencies in my data
0
gravatar for yorgos.athanasiadis
5.1 years ago by
Denmark
yorgos.athanasiadis40 wrote:

Hi,

I'm having the following problem with PLINK:

I am using the --freq command to calculate allele frequencies from an input that was created from 23andMe data. However all I get in the frq report is NA estimates for all minor allele frequencies:

 CHR          SNP   A1   A2          MAF  NCHROBS
   1   rs12564807    0    A           NA        0
   1    rs3131972    A    G           NA        0
   1  rs148828841    A    C           NA        0
   1   rs12124819    G    A           NA        0
   1  rs115093905    T    G           NA        0
   1   rs11240777    A    G           NA        0
etc...

Same thing goes for --hwe etc. --missing is the only command that seems to be working, so I know that the file is read correctly.

I don't know what's wrong because PLINK reads the input files correctly. I suspect it is the allele coding, but I have tried several solutions and they still don't work. Has anyone come accross with a similar issue?

Yorgos

snp plink maf freq • 4.0k views
ADD COMMENTlink written 5.1 years ago by yorgos.athanasiadis40

Can you post your log file.

ADD REPLYlink written 5.1 years ago by Maxime Lamontagne2.2k

Sure!

PLINK v1.90b2i 64-bit (8 Sep 2014)
4 arguments: --file test --freq --set-hh-missing
Hostname:
Working directory: /Users/
Start time: Tue Sep 30 17:26:49 2014

Random number seed: 1412090809
16384 MB RAM detected; reserving 8192 MB for main workspace.
Scanning .ped file... done.
Performing single-pass .bed write (592555 variants, 723 people).
--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.
592555 variants loaded from .bim file.
723 people (232 males, 491 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Calculating allele frequencies... done.
Warning: 206862 het. haploid genotypes present (see plink.hh ).
Warning: Nonmissing nonmale Y chromosome genotype(s) present.
Total genotyping rate is 0.986469.
--freq: Allele frequencies written to plink.frq .

End time: Tue Sep 30 17:27:05 2014
ADD REPLYlink written 5.1 years ago by yorgos.athanasiadis40
1

Try adding --nonfounders to the command line.  Normally, PLINK --freq and --hwe excludes all samples with at least one parental ID; so if everyone in your dataset has parental IDs (it's necessary to use '0' to indicate an unknown parent), that would explain your result.

(You should also use the most recent build: there was a --nonfounders bug fixed on September 26th.)

If --nonfounders does not fix the problem, let me know.

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by chrchang5235.8k

I spent the entire morning testing different files and I got to the exact same conclusion:

When I first built the ped file, I assigned a non-zero father and mother to all my individuals, so there were no founder individuals left to be used for allele frequency calculations. I was just about to re-built the file with 0's for dads and mums, but then I saw your reply:  --nonfounders flag actually worked, so thank you so much!

I don't know if I should lough or cry, ha ha ha...

ADD REPLYlink written 5.1 years ago by yorgos.athanasiadis40

Did you check your plink.hh file? It says you have a lot of haploid genotypes present. This suggests that your file format might be off.

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Ryan D3.3k

I did check it and I tried different things to solve the problem (including using the --set-hh-missing option and by removing X, Y, XY and mtDNA SNPs), but the problem persists...

Any ideas? :-/

ADD REPLYlink written 5.1 years ago by yorgos.athanasiadis40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1591 users visited in the last hour