Tool: vcfanno quickly annotate a vcf with other VCFs, BEDs, BAMs
11
gravatar for brentp
4.2 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

We have been developing vcfanno for some time now as a tool to quickly and easily annotate (put stuff in the info field) of a VCF. It is very fast, and binaries are provided for all major systems for both 32 and 64bit.

There is help in the README, for introduction, but here we'll briefly cover the basics...

Users specify a simple conf file (TOML format) that contains the annotation files, the columns/fields to pull from each file, and the operations to perform on each file.

 

Simple Annotation

For example, we may wish to annotate our VCF with low-complexity regions in a BED file. That would be indicated by this conf section:

[[annotation]]
file="LCR-hs37d5.bed.gz"
names=["LCR"]
columns=[2]
ops=["flag"]

This tells vcfanno where to find the file, how to name it (LCR) in the annotated file, which column to pull from (we just use a dummy column), and the ops to perform on it. In this case, we just want to know if the variant overlapped with a low-complexity region so the flag op indicates presence of an overlap.

Ops

Other ops include mean, max, min, concat, uniq, etc. In many cases, each query variant will only overlap a single annotation, in which case, the choice of op has little effect.

Annotate With Clinvar

Note that this is an advanced example to show the customizability of vcfanno, nearly all annotations will have a simpler configuration than does this example.

For some annotations, the built in operations are not sufficient. For example, clinvar provides a VCF where the fields are encoded. The CLNSIG field has numbers that indicate the signficance of the variant that'd we have to look up. We can add these as flags to our new VCF using custom javascript op to handle the annotation. A pathogenic variant in clinvar is encoded with the number 5, so we write the following javascript:

function clinvar_pathogenic_flag(vals){
    for(i=0;i<vals.length;i++){
        if(vals[i] == 5){
            return true
        }
    }
    return false
}

which returns true if any of the CLINVAR variants overlapping the current query variant are pathogenic. We then use that javascript in the ops field as:

[[annotation]]
file="clinvar_20150305.tidy.vcf.gz"
fields=["CLNSIG"]
names=["clinvar_pathogenic"]
ops=["js:clinvar_pathogenic_flag(vals)"]

Because the javascript function ends in _flag, vcfanno knows that a flag will be returned. Because the op starts with "js:", vcfanno knows it's javascript and evaluates the requested code with the values collected from the current variant.

 

Configuration

There are additional configuration examples here and here for common data sources such as ExAC, dbSNP, 1000G, fitcons, etc.

 

CADD

There is also support for annotating with CADD. An extensive help is available here. It allows annotating a VCF with CADD.

 

Feedback appreciated:

https://github.com/brentp/vcfanno

tool annotation vcf • 2.9k views
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by brentp23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1129 users visited in the last hour