While working on a couple of exome projects, I've ran into situations where for some of the variants we are calling, the reference genome allele is annotated in dbSNP as the minor allele (MAF<5%), and as such the variants are not so interesting to us.
Here is an example:rs7185 < http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=7185 > ref=C -> alt=T MAF/MinorAlleleCount: C=0.142/311
I would like to filter those out, but cannot find a simple file with that information... I am using the 00-All.vcf.gz from dbSNP to identify known variants, which lists whether some of the alleles are low frequency, but not which one is which...
I would like to be able to flag these variants - does someone know where I can get that data from, ideally somewhere that gets updated with dbSNP versions? Thank you so much!