Question: Vcf Programming Language Api'S / Parsers Available?
3
gravatar for William
6.5 years ago by
William4.4k
Europe
William4.4k wrote:

Are there any programming language APIs available yet for working with VCF files? The only tool I know of is vcf tools which as far as I know only has a command line interface and no application interface.

What I am looking for is the vcf equivalent of Picards, Samtools or Bamtools. A library that you can use from your own code to parse trough VCF files.

Preferentially a object oriented library in Java, but anything in Python or Perl will also do.

vcf api programming • 4.2k views
ADD COMMENTlink modified 6.4 years ago by Erik Garrison2.1k • written 6.5 years ago by William4.4k
7
gravatar for Pierre Lindenbaum
6.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

see the latest post on my blog:

"Reading/Writing a VCF file with the GATK-API. "

http://plindenbaum.blogspot.fr/2012/11/readingwriting-vcf-file-with-gatk-api.html

ADD COMMENTlink written 6.5 years ago by Pierre Lindenbaum120k

Nice GATK ofcourse has a VCF reader and writer in their source. I will have a look at this.

ADD REPLYlink written 6.5 years ago by William4.4k

I copied your code and it seems to work. But I get an exception when closing the vcf writer. Did you run into this? Exception in thread "main" java.lang.NullPointerException at org.broadinstitute.sting.gatk.refdata.tracks.IndexDictionaryUtils.setIndexSequenceDictionary(IndexDictionaryUtils.java:85) at org.broadinstitute.sting.utils.variantcontext.writer.IndexingVariantContextWriter.close(IndexingVariantContextWriter.java:100) at org.broadinstitute.sting.utils.variantcontext.writer.VCFWriter.close(VCFWriter.java:147)

ADD REPLYlink written 6.5 years ago by William4.4k

I am using the latest GATK lite jar file as a library.

ADD REPLYlink written 6.5 years ago by William4.4k

strange. I cannot help you with this only stacktrace.

ADD REPLYlink written 6.5 years ago by Pierre Lindenbaum120k

Well the writer seems to write all the output before crashing on closing so I'll just look into it later.

ADD REPLYlink written 6.5 years ago by William4.4k
5
gravatar for Sean Davis
6.5 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

Python: https://github.com/jamescasbon/PyVCF

ADD COMMENTlink written 6.5 years ago by Sean Davis25k
5
gravatar for Erik Garrison
6.4 years ago by
Erik Garrison2.1k
Somerville, MA
Erik Garrison2.1k wrote:

C++: vcflib

The design was meant to be similar to bamtools, which I found very easy to work with. You open a VCF file and iterate through record, populating an object with the VCF record on each iteration. You can write out VCF records using the << operator. Documentation is sparse, because I continuously develop the library as a way of dealing with edge cases not handled by by other systems such as vcf-tools or the GATK. Variant.h should provide ample self-documentation. Function names are meant to be self-explanatory.

Example usage, counting the number of alternate alleles in a VCF file (taken from vcfcountalleles):

#include "Variant.h"

using namespace std;
using namespace vcf;

int main(int argc, char** argv) {

    VariantCallFile variantFile;
    string filename = argv[1];
    variantFile.open(filename);

    int uniqueAlleles = 0;

    Variant var(variantFile);
    while (variantFile.getNextVariant(var)) {
        uniqueAlleles += var.alleles.size();
    }

    cout << uniqueAlleles << endl;

    return 0;

}

There are a lot of existing functions in the library for doing everything from haplotype-based allele intersection to filtering to genotype annotation. The stock programs in the library are almost entirely designed to read and write VCF, so, for instance, whenever possible annotations are added to records rather than dumped in a separate format, allowing the streaming of the results of one into the next and so on.

I intend to document them and will post a news item here on BioStar and also seqanswers when I do.

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Erik Garrison2.1k
4
gravatar for Frédéric Bigey
6.5 years ago by
Montpellier, France
Frédéric Bigey280 wrote:

Have a look at the Perl API vcf.pm) from the VCFtools.

"VCFtools contains a Perl API Vcf.pm) and a number of Perl scripts that can be used to perform common tasks with VCF files such as file validation, file merging, intersecting, complements, etc."

ADD COMMENTlink written 6.5 years ago by Frédéric Bigey280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 832 users visited in the last hour