Tool:vcfanno quickly annotate a vcf with other VCFs, BEDs, BAMs
0
16
Entering edit mode
8.8 years ago
brentp 24k

We have been developing vcfanno for some time now as a tool to quickly and easily annotate (put stuff in the info field) of a VCF. It is very fast, and binaries are provided for all major systems for both 32 and 64bit.

There is help in the README, for introduction, but here we'll briefly cover the basics...

Users specify a simple conf file (TOML format) that contains the annotation files, the columns/fields to pull from each file, and the operations to perform on each file.


Simple Annotation

For example, we may wish to annotate our VCF with low-complexity regions in a BED file. That would be indicated by this conf section:

[[annotation]]
file="LCR-hs37d5.bed.gz"
names=["LCR"]
columns=[2]
ops=["flag"]

This tells vcfanno where to find the file, how to name it (LCR) in the annotated file, which column to pull from (we just use a dummy column), and the ops to perform on it. In this case, we just want to know if the variant overlapped with a low-complexity region so the flag op indicates presence of an overlap.

Ops

Other ops include mean, max, min, concat, uniq, etc. In many cases, each query variant will only overlap a single annotation, in which case, the choice of op has little effect.

Annotate With Clinvar

Note that this is an advanced example to show the customizability of vcfanno, nearly all annotations will have a simpler configuration than does this example.

For some annotations, the built in operations are not sufficient. For example, clinvar provides a VCF where the fields are encoded. The CLNSIG field has numbers that indicate the significance of the variant that'd we have to look up. We can add these as flags to our new VCF using custom javascript op to handle the annotation. A pathogenic variant in clinvar is encoded with the number 5, so we write the following javascript:

function clinvar_pathogenic_flag(vals){
    for(i=0;i<vals.length;i++){
        if(vals[i] == 5){
            return true
        }
    }
    return false
}

which returns true if any of the CLINVAR variants overlapping the current query variant are pathogenic. We then use that javascript in the ops field as:

[[annotation]]
file="clinvar_20150305.tidy.vcf.gz"
fields=["CLNSIG"]
names=["clinvar_pathogenic"]
ops=["js:clinvar_pathogenic_flag(vals)"]

Because the javascript function ends in _flag, vcfanno knows that a flag will be returned. Because the op starts with js:, vcfanno knows it's javascript and evaluates the requested code with the values collected from the current variant.


Configuration

There are additional configuration examples here and here for common data sources such as ExAC, dbSNP, 1000G, fitcons, etc.


CADD

There is also support for annotating with CADD. An extensive help is available here. It allows annotating a VCF with CADD.


Feedback appreciated:

https://github.com/brentp/vcfanno

VCF variant-annotation • 6.0k views
ADD COMMENT
1
Entering edit mode

Hi, Thank you for this great tutorial! The link for CADD is broken. Would you please update it? Thank you in advance!

ADD REPLY
1
Entering edit mode

I updated the post to link to the new cadd docs.

ADD REPLY

Login before adding your answer.

Traffic: 1470 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6