Question: Using VCFTools (v0.1.13) with ExAC (release 0.3.1)
3
gravatar for areyoujokingme
2.9 years ago by
United States
areyoujokingme60 wrote:

I'm trying to extract allele frequency information for all variants in the latest ExAC data release. I'm currently using vcftools version 0.1.13, and am coming into errors that relate, I think, to versions.

When I validate the ExAC data VCF file, I get an error message for every INFO line in the file, that says something like the below:

>> vcf-validator ExAC.r0.3.1.sites.vep.vcf
INFO field at 1:13404 .. INFO tag [AC_Het=0,1,0] expected different number of values (expected 2, found 3)
FILTER field at 1:13418 ..  The filter(s) [AC_Adj0_Filter] not listed in the header.

This suggests, I believe, that the ExAC data VCF file is not in the format vcftools wants it to be, which is v4.0, v4.1 or v4.2. If this is true, how do I make the ExAC data VCF file vcftools- compatible?

The real error I'm trying to overcome comes from when I try to extract the frequency information:

>> vcftools --gzvcf ExAC.r0.3.1.sites.vep.vcf.gz --freq --chr 1 --out chr1_freq

VCFtools - v0.1.13
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
--gzvcf ExAC.r0.3.1.sites.vep.vcf.gz
--chr 1
--freq
--out chr1_freq

Using zlib version: 1.2.8
After filtering, kept 0 out of 0 Individuals
Error: Require Genotypes in VCF file in order to output Frequency Statistics.

Can anyone help me out?

vcftools exac format • 2.1k views
ADD COMMENTlink modified 2.9 years ago by Pierre Lindenbaum116k • written 2.9 years ago by areyoujokingme60
3
gravatar for Pierre Lindenbaum
2.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

Hum... I would say it's a bug in Exac:

##INFO=<ID=AC_Het,Number=A,Type=Integer,Description="Adjusted Heterozygous Counts">
(...)
1   13404   .   G   A,T 136.58  VQSRTrancheSNP99.60to99.80  (...)AC_Het=0,1,0;

as far as I know the VCF format, there should have only two numbers for AC_Het for each ALT allele.

ADD COMMENTlink written 2.9 years ago by Pierre Lindenbaum116k
3

Apologies - this is indeed a bug in ExAC. AC_Het should be Number=. and unfortunately this is the best way to do this at the moment, since its a semi-complicated field: in reality, it's something like Number="G minus R" but since something like that does not exist in the spec, we'll have to settle for '.' and one will have to parse it manually.

We will fix the VCF with this and other minor header issues shortly.

ADD REPLYlink written 2.8 years ago by konradjk30

I just found the version number of ExAC, and it is actually v4.2! This cannot be a version issue, now. Can you explain how it's possible for ExAC to have a bug? For example, why did you paste the line above? Sorry if I'm missing something obvious!

ADD REPLYlink written 2.9 years ago by areyoujokingme60

how it's possible for ExAC to have a bug?

A: What is the reason for most software errors in Bioinformatics according to you?

For example, why did you paste the line above?

  • because vcftools says there is an error for "INFO field at 1:13404 "
  • because there are two ALT alleles and the definition of AC_HEt is 'number=A', that is to say, one value for each ALT allele. I've written an issue: https://github.com/konradjk/exac_browser/issues/256
ADD REPLYlink written 2.9 years ago by Pierre Lindenbaum116k

Thanks for writing an issue! Hopefully we hear from them soon!

ADD REPLYlink written 2.9 years ago by areyoujokingme60

It may be counting the 3 possible heterozygous states: G/A, G/T, and A/T

ADD REPLYlink written 2.9 years ago by rbagnall1.3k

@rbagnall yes, but then ID=AC_Het,Number=A shouldn't be used.

ADD REPLYlink written 2.9 years ago by Pierre Lindenbaum116k

UPDATE : the bug is now fixed: https://github.com/konradjk/exac_browser/issues/256#issuecomment-282719065

ADD REPLYlink written 23 months ago by Pierre Lindenbaum116k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 652 users visited in the last hour