Question: Best way to get Allele Frequency in dbSNP file of SNPs in a different VCF file?
gravatar for markgodek
11 weeks ago by
markgodek30 wrote:


I have some VCFs generated by Mutect 2. For each variant in these files, I want to get the allele frequency from dbSNP and 1k Genome VCFs I have.

Could you recommend a tool to do this?

I considered writing a python script to do it, but thought there is a better way than iterating over both dbSNP and 1k Genomes for every line in my Mutect 2 output.


allele frequency vcf • 219 views
ADD COMMENTlink written 11 weeks ago by markgodek30

Why don't you use Ensembl VEP to annotate your VCFs?

ADD REPLYlink written 11 weeks ago by brunobsouzaa400

We decided to use GATK best practices so we're using their Funcotator for functional annotation, but the "Frequency data for co-located variants" function of VEP does look promising .Thanks.

ADD REPLYlink written 11 weeks ago by markgodek30

Take a look at bcftools annotate. You may also want to check if your input is normalized (left aligned, parsimonious and multi-allelics split) before using bcftools annotate. You can do the pre-processing steps using bcftools norm or vt decompose + vt normalize

ADD REPLYlink written 11 weeks ago by RamRS30k

Thanks. I'm new to all this so when vcftools and bcftools had given me errors about multiallelic sites, I just removed them with SelectVariants --restrict-alleles-to BIALLELIC

So after pre-processing with bcftools norm, I should be able to do something like this?

bcftools annotate -a 1000G_phase1.snps.high_confidence.b37.vcf.gz -h annotations.hdr -c CHROM,POS,ID,INFO/1000G_AF:=INFO/AF myVCF.vcf

with annotations.hdr being something like

##INFO=<ID=1000G_AF,Number=.,Type=Float,Description="Allele Frequency from 1000K genomes high confidence SNPs">
ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by markgodek30

Pass a 100 variants through your bcftools annotate and verify a few - that way, you will know it works. You may want to use the --collapse parameter to make sure comparisons take CHROM, POS, REF and ALT into account to match the 2 VCFs.

ADD REPLYlink written 11 weeks ago by RamRS30k

Thanks for your help. I was using bcftools annotate to change chromosome names in an earlier step, but didn't know it also had this function.

I've taking a deeper dive into bcftools and vcftools and it's really making my project easier.

Thanks again.

ADD REPLYlink written 11 weeks ago by markgodek30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1217 users visited in the last hour