Question: Interpreting ANNOVAR allele frequency output
gravatar for eric.kern13
3.8 years ago by
United States
eric.kern13180 wrote:

Hi Biostars,

I'm using ANNOVAR to annotate some WGS data. I want to pare down the list until I am left with variants of very low frequency. I've got ANNOVAR working, but I'd appreciate your help interpreting its output. Here's what I did.

To split out the rare variants, I issued this command (or rather, a version of it that works on my cluster).

perl --filter --buildver hg19 --maf_threshold 0.0001 --dbtype 1000g2014oct_all --outfile maf-1e4 --comment my_variants.avinput <path_to_humandb>

I got out 5.4 million common variants in the file maf-1e4.hg19_ALL.sites.2014_10_dropped and another 1.1 million rare variants in maf-1e4.hg19_ALL.sites.2014_10_filtered. Great -- except that there was no column giving allele frequencies for the rare variants! There was one for the common variants, though. I figured maybe only dropped variants get annotated (though that would be weird and annoying), and I tried this here hack to get ANNOVAR to drop my rare variants:

cp maf-1e4.hg19_ALL.sites.2014_10_filtered rare_variants.avinput

perl --filter --buildver hg19 --maf_threshold 0.0000000001 --dbtype 1000g2014oct_all --outfile maf-1e10 --comment rare_variants.avinput <path_to_humandb>

The result: all of my variants passed the filter!

wc -l maf-1e10.*
   0 maf-1e10.hg19_ALL.sites.2014_10_dropped
   1114818 maf-1e10.hg19_ALL.sites.2014_10_filtered

So there are no variants rarer than 1 copy in 10,000 but loads rarer than one copy on Earth? Unbelievable! Here's what I think actually happened: 1000 Genomes, having on the order of 1000 genomes to work with, cannot tell the difference between 0.01% and one copy per multiverse. So, I should interpret my 1.1 million variants as all being rare enough that they do not appear in 1000 Genomes. Is that right?


sequencing genome • 2.1k views
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by eric.kern13180
gravatar for eric.kern13
3.8 years ago by
United States
eric.kern13180 wrote:

Below, I quote a response from Kai Wang, ANNOVAR creator/maintainer:

you should just use to print out allele frequency for all variants in your input. The word "minor allele frequency" cannot be defined well, because rare allele in one population will be common allele in another population, and generally should NEVER be used in genetics, and because the reference genome does contain many sites that have REFERENCE allele being rare allele in any human populations.

Another note is that it looks like those in "filtered" file in your question are those that are not annotated in 1000G, so you will not have an allele frequency measure. Nothing unexpected here.

ADD COMMENTlink written 3.8 years ago by eric.kern13180
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1204 users visited in the last hour