Question: A tool to annotate one VCF file with INFO records of another VCF taking SNP into account?
2
gravatar for tsukanoffkirill
4.3 years ago by
tsukanoffkirill20 wrote:

I would like to annotate records in one VCF file (input.vcf) with some of the INFO fields of the corresponding records from the database (db.vcf), but only if the recorded mutation matches exactly in input and in the database. E. g. let's say I have three very simple VCF files:

>>> input.vcf
##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    878638  .       G       A       100     PASS    A=3.0

>>> db1.vcf
##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    878638  .       G       A       100     PASS    B=4.0

>>> db2.vcf
##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    878638  .       G       T       100     PASS    B=4.0

Note that db1 and db2 describe different SNPs at the same locus; SNP in db1.vcf matches with the one in input.vcf, but SNP in db2.vcf does not. I need a tool that can discern such cases and annotate the input file record with information from database only if the mutations match. Is there a tool to accomplish what I want?

I tried using GATK's VariantAnnotator and vcflib's vcfaddinfo; they unfortunately both ignore information about the mutation and add B=4.0 annotation in both cases.

Just to clarify, this is what I want in the case described:

$ some_tool input.vcf db1.vcf # SNP in input and database matches
##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    878638  .       G       A       100     PASS    A=3.0;B=4.0
$ some_tool input.vcf db2.vcf # SNP in input and database do not match
##fileformat=VCFv4.1
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    878638  .       G       A       100     PASS    A=3.0
variant annotation snp vcf • 1.9k views
ADD COMMENTlink modified 4.3 years ago by Shane McCarthy320 • written 4.3 years ago by tsukanoffkirill20
7
gravatar for Shane McCarthy
4.3 years ago by
Cambridge, Cambridgeshire
Shane McCarthy320 wrote:

Try bcftools annotate

bgzip -c input.vcf > input.vcf.gz; tabix input.vcf.gz
bgzip -c db.vcf > db.vcf.gz; tabix db.vcf.gz

bcftools annotate -a db.vcf.gz -c CHROM,POS,REF,ALT,INFO/B input.vcf.gz > output.vcf

this will fill in INFO/B from db.vcf.gz when all of CHROM,POS,REF and ALT match.

ADD COMMENTlink modified 3.6 years ago • written 4.3 years ago by Shane McCarthy320

Just tested, and it does precisely what I want. Thank you so much!

Also I see that you are a maintainer and developer on bcftools, so double thank you for both the tool and your answer :-)

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by tsukanoffkirill20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1685 users visited in the last hour