Let's say I have this variant:
1 123141 . A C
My alternate allele at this site is C. The reported global MAF in 1000G is 0.8564. However, this is the allele frequency for A at that site, not C.
Here are the options for running VEP: http://useast.ensembl.org/info/docs/tools/vep/script/vep_options.html These are the VEP options I am using:
--check_existing --check_alleles --gmaf --maf_1kg --maf_esp
The cells given by --maf_1kg seem to give the allele frequency of my alt allele, with these options. But I need the global allele frequency for my alt allele. I'm still trying to figure out how this works. In the process of writing this question, I'm finding new problems that aren't making sense.
For rs200645137 +C insertion, VEP gives me this:
AFR_MAF|AMR_MAF |EAS_MAF|EUR_MAF |SAS_MAF |C:0.4024|C:0.353|C:0.4365|C:0.2843
AFR_MAF is missing.
From dbsnp: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=200645137 I get:
AFR_MAF |AMR_MAF|EAS_MAF |EUR_MAF |SAS_MAF C:0.4024|C:0.353|C:0.4365|C:0.2843|C:0.456
So, really, SAS_MAF was missing and all the numbers were misaligned. VEP also gives me a global minor allele frequency of:
which is also in dbsnp.
Annovar gives me 1-that, or 0.516291, because I have the C present. But even that does not make sense to me. But when I calculate the allele frequency myself, I get this:
1008*0.4365+1006*0.2843+1322*0.4024+694*0.353+978*0.456==1948.921 1008+1006+1322+694+978==5008 1948.921/5008==0.3891615
So the 1000Genomes AF for my alt allele is 0.3891615.
What is going on!? I'm definitely using 1000G phase 3 data.
Edit: Okay, the global allele frequencies don't match because it's on the X-chromosome and not everyone has two of those. But the VEP columns still don't match up. The SAS_MAF is being shoved into the next column, and AFR_MAF is either empty, or is filled with a number I can't find anywhere else. The ESP columns for AA_MAF and EA_MAF don't seem to have the correct numbers either.