Recommendations For Python Vcf Parser/Writer?
9.8 years ago
Reece ▴ 310

I'm looking for a VCF 4.1 parser and writer. I'm aware of these:

Do you know of other options or have recommendations to share?

9.8 years ago
brentp 23k

I've looked at the ones you mention and any others I could find. This one seems to be the most complete and easiest to use: https://github.com/jdoughertyii/PyVCF

usage is like:

for rec in VCFReader(open('some.vcf')):
print rec.CHROM, rec.POS, rec.filter, rec.info["AF"]


though, it does not have a writer class.

EDIT:

This, has become the official fork and it has a writer class.

I am using that library as well (with a couple of minor mods) for another project. Works okay for me.

The idea for the UPPER was to distinguish native (upper) fields from derived (lower) attributes/methods. For better or worse...

thanks. any idea why UPPERCASE field names?

Not sure other than that's how they appear in the VCF filter. You could file a bug at https://github.com/jamescasbon/PyVCF

Pyvcf is too slow ... Is there anything else in python using C++ as backend ?

9.7 years ago
Erik Garrison ★ 2.3k

For C++, I've written vcflib. It has utilities for a number of functions, such as haplotype-based file comparisons (for accurate indel comparisons), filtering, and statistical summarization. It can operate on uncompressed or compressed and tabix indexed VCF files. Mostly, I've used it as a reader/writer class for other projects.

23 months ago
Dataman ▴ 340

I know this question is rather old and has an answer but it is still a relevant question. A recent, alternative for parsing VCF files in Python (both versions 2 and 3) is cyvcf2 which is made by two well known bioinformaticians: Brent Pedersen and Aaron Quinlan.
GitHub link: http://brentp.github.io/cyvcf2/ and https://github.com/brentp/cyvcf2.
The Journal article: https://academic.oup.com/bioinformatics/article/33/12/1867/2971439