Tool: vcfanno quickly annotate a vcf with other VCFs, BEDs, BAMs
gravatar for brentp
3.7 years ago by
Salt Lake City, UT
brentp22k wrote:

We have been developing vcfanno for some time now as a tool to quickly and easily annotate (put stuff in the info field) of a VCF. It is very fast, and binaries are provided for all major systems for both 32 and 64bit.

There is help in the README, for introduction, but here we'll briefly cover the basics...

Users specify a simple conf file (TOML format) that contains the annotation files, the columns/fields to pull from each file, and the operations to perform on each file.


Simple Annotation

For example, we may wish to annotate our VCF with low-complexity regions in a BED file. That would be indicated by this conf section:


This tells vcfanno where to find the file, how to name it (LCR) in the annotated file, which column to pull from (we just use a dummy column), and the ops to perform on it. In this case, we just want to know if the variant overlapped with a low-complexity region so the flag op indicates presence of an overlap.


Other ops include mean, max, min, concat, uniq, etc. In many cases, each query variant will only overlap a single annotation, in which case, the choice of op has little effect.

Annotate With Clinvar

Note that this is an advanced example to show the customizability of vcfanno, nearly all annotations will have a simpler configuration than does this example.

For some annotations, the built in operations are not sufficient. For example, clinvar provides a VCF where the fields are encoded. The CLNSIG field has numbers that indicate the signficance of the variant that'd we have to look up. We can add these as flags to our new VCF using custom javascript op to handle the annotation. A pathogenic variant in clinvar is encoded with the number 5, so we write the following javascript:

function clinvar_pathogenic_flag(vals){
        if(vals[i] == 5){
            return true
    return false

which returns true if any of the CLINVAR variants overlapping the current query variant are pathogenic. We then use that javascript in the ops field as:


Because the javascript function ends in _flag, vcfanno knows that a flag will be returned. Because the op starts with "js:", vcfanno knows it's javascript and evaluates the requested code with the values collected from the current variant.



There are additional configuration examples here and here for common data sources such as ExAC, dbSNP, 1000G, fitcons, etc.



There is also support for annotating with CADD. An extensive help is available here. It allows annotating a VCF with CADD.


Feedback appreciated:

tool annotation vcf • 2.6k views
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by brentp22k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1185 users visited in the last hour