Best way to get Allele Frequency in dbSNP file of SNPs in a different VCF file?
0
0
Entering edit mode
3.6 years ago
markgodek ▴ 50

Hi,

I have some VCFs generated by Mutect 2. For each variant in these files, I want to get the allele frequency from dbSNP and 1k Genome VCFs I have.

Could you recommend a tool to do this?

I considered writing a python script to do it, but thought there is a better way than iterating over both dbSNP and 1k Genomes for every line in my Mutect 2 output.

Thanks.

vcf allele frequency • 1.8k views
ADD COMMENT
1
Entering edit mode

Why don't you use Ensembl VEP to annotate your VCFs?

ADD REPLY
0
Entering edit mode

We decided to use GATK best practices so we're using their Funcotator for functional annotation, but the "Frequency data for co-located variants" function of VEP does look promising .Thanks.

ADD REPLY
1
Entering edit mode

Take a look at bcftools annotate. You may also want to check if your input is normalized (left aligned, parsimonious and multi-allelics split) before using bcftools annotate. You can do the pre-processing steps using bcftools norm or vt decompose + vt normalize

ADD REPLY
0
Entering edit mode

Thanks. I'm new to all this so when vcftools and bcftools had given me errors about multiallelic sites, I just removed them with SelectVariants --restrict-alleles-to BIALLELIC

So after pre-processing with bcftools norm, I should be able to do something like this?

bcftools annotate -a 1000G_phase1.snps.high_confidence.b37.vcf.gz -h annotations.hdr -c CHROM,POS,ID,INFO/1000G_AF:=INFO/AF myVCF.vcf

with annotations.hdr being something like

##INFO=<ID=1000G_AF,Number=.,Type=Float,Description="Allele Frequency from 1000K genomes high confidence SNPs">
ADD REPLY
1
Entering edit mode

Pass a 100 variants through your bcftools annotate and verify a few - that way, you will know it works. You may want to use the --collapse parameter to make sure comparisons take CHROM, POS, REF and ALT into account to match the 2 VCFs.

ADD REPLY
0
Entering edit mode

Thanks for your help. I was using bcftools annotate to change chromosome names in an earlier step, but didn't know it also had this function.

I've taking a deeper dive into bcftools and vcftools and it's really making my project easier.

Thanks again.

ADD REPLY

Login before adding your answer.

Traffic: 2631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6