Vcf Programming Language Api'S / Parsers Available?
4
3
Entering edit mode
11.4 years ago
William ★ 5.3k

Are there any programming language APIs available yet for working with VCF files? The only tool I know of is vcf tools which as far as I know only has a command line interface and no application interface.

What I am looking for is the vcf equivalent of Picards, Samtools or Bamtools. A library that you can use from your own code to parse trough VCF files.

Preferentially a object oriented library in Java, but anything in Python or Perl will also do.

vcf api programming • 6.2k views
ADD COMMENT
7
Entering edit mode
11.4 years ago

see the latest post on my blog:

"Reading/Writing a VCF file with the GATK-API. "

http://plindenbaum.blogspot.fr/2012/11/readingwriting-vcf-file-with-gatk-api.html

ADD COMMENT
0
Entering edit mode

Nice GATK ofcourse has a VCF reader and writer in their source. I will have a look at this.

ADD REPLY
0
Entering edit mode

I copied your code and it seems to work. But I get an exception when closing the vcf writer. Did you run into this? Exception in thread "main" java.lang.NullPointerException at org.broadinstitute.sting.gatk.refdata.tracks.IndexDictionaryUtils.setIndexSequenceDictionary(IndexDictionaryUtils.java:85) at org.broadinstitute.sting.utils.variantcontext.writer.IndexingVariantContextWriter.close(IndexingVariantContextWriter.java:100) at org.broadinstitute.sting.utils.variantcontext.writer.VCFWriter.close(VCFWriter.java:147)

ADD REPLY
0
Entering edit mode

I am using the latest GATK lite jar file as a library.

ADD REPLY
0
Entering edit mode

strange. I cannot help you with this only stacktrace.

ADD REPLY
0
Entering edit mode

Well the writer seems to write all the output before crashing on closing so I'll just look into it later.

ADD REPLY
7
Entering edit mode
11.4 years ago
Erik Garrison ★ 2.4k

C++: vcflib

The design was meant to be similar to bamtools, which I found very easy to work with. You open a VCF file and iterate through record, populating an object with the VCF record on each iteration. You can write out VCF records using the << operator. Documentation is sparse, because I continuously develop the library as a way of dealing with edge cases not handled by by other systems such as vcf-tools or the GATK. Variant.h should provide ample self-documentation. Function names are meant to be self-explanatory.

Example usage, counting the number of alternate alleles in a VCF file (taken from vcfcountalleles):

#include "Variant.h"

using namespace std;
using namespace vcf;

int main(int argc, char** argv) {

    VariantCallFile variantFile;
    string filename = argv[1];
    variantFile.open(filename);

    int uniqueAlleles = 0;

    Variant var(variantFile);
    while (variantFile.getNextVariant(var)) {
        uniqueAlleles += var.alleles.size();
    }

    cout << uniqueAlleles << endl;

    return 0;

}

There are a lot of existing functions in the library for doing everything from haplotype-based allele intersection to filtering to genotype annotation. The stock programs in the library are almost entirely designed to read and write VCF, so, for instance, whenever possible annotations are added to records rather than dumped in a separate format, allowing the streaming of the results of one into the next and so on.

I intend to document them and will post a news item here on BioStar and also seqanswers when I do.

ADD COMMENT
5
Entering edit mode
ADD COMMENT
4
Entering edit mode
11.4 years ago

Have a look at the Perl API vcf.pm) from the VCFtools.

"VCFtools contains a Perl API Vcf.pm) and a number of Perl scripts that can be used to perform common tasks with VCF files such as file validation, file merging, intersecting, complements, etc."

ADD COMMENT

Login before adding your answer.

Traffic: 1322 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6