PLINK returns NA values for all minor allele frequencies in my data
0
0
Entering edit mode
8.2 years ago

Hi,

I'm having the following problem with PLINK:

I am using the --freq command to calculate allele frequencies from an input that was created from 23andMe data. However all I get in the frq report is NA estimates for all minor allele frequencies:

 CHR          SNP   A1   A2          MAF  NCHROBS
1   rs12564807    0    A           NA        0
1    rs3131972    A    G           NA        0
1  rs148828841    A    C           NA        0
1   rs12124819    G    A           NA        0
1  rs115093905    T    G           NA        0
1   rs11240777    A    G           NA        0
etc...


Same thing goes for --hwe etc. --missing is the only command that seems to be working, so I know that the file is read correctly.

I don't know what's wrong because PLINK reads the input files correctly. I suspect it is the allele coding, but I have tried several solutions and they still don't work. Has anyone come across with a similar issue?

Yorgos

freq plink maf SNP • 6.3k views
0
Entering edit mode

Can you post your log file.

0
Entering edit mode

Sure!

PLINK v1.90b2i 64-bit (8 Sep 2014)
4 arguments: --file test --freq --set-hh-missing
Hostname:
Working directory: /Users/
Start time: Tue Sep 30 17:26:49 2014

Random number seed: 1412090809
16384 MB RAM detected; reserving 8192 MB for main workspace.
Scanning .ped file... done.
Performing single-pass .bed write (592555 variants, 723 people).
written.
592555 variants loaded from .bim file.
723 people (232 males, 491 females) loaded from .fam.
Calculating allele frequencies... done.
Warning: 206862 het. haploid genotypes present (see plink.hh ).
Warning: Nonmissing nonmale Y chromosome genotype(s) present.
Total genotyping rate is 0.986469.
--freq: Allele frequencies written to plink.frq .

End time: Tue Sep 30 17:27:05 2014

2
Entering edit mode

Try adding --nonfounders to the command line. Normally, PLINK --freq and --hwe excludes all samples with at least one parental ID; so if everyone in your dataset has parental IDs (it's necessary to use '0' to indicate an unknown parent), that would explain your result.

(You should also use the most recent build: there was a --nonfounders bug fixed on September 26th.)

If --nonfounders does not fix the problem, let me know.

0
Entering edit mode

I spent the entire morning testing different files and I got to the exact same conclusion:

When I first built the ped file, I assigned a non-zero father and mother to all my individuals, so there were no founder individuals left to be used for allele frequency calculations. I was just about to re-built the file with 0's for dads and mums, but then I saw your reply: --nonfounders flag actually worked, so thank you so much!

I don't know if I should lough or cry, ha ha ha...

0
Entering edit mode

Did you check your plink.hh file? It says you have a lot of haploid genotypes present. This suggests that your file format might be off.

0
Entering edit mode

I did check it and I tried different things to solve the problem (including using the --set-hh-missing option and by removing X, Y, XY and mtDNA SNPs), but the problem persists...

Any ideas? :-/