Question: How to filter mutant allele frequency in Variant Effect Predictor (VEP)?
gravatar for kin182
3.0 years ago by
kin18210 wrote:

This is the first time I used Variant Effect Predictor (VEP) and would like to use it to annotate the VCF files I got from WES data. I tried to set up some filters to include only the mutations with mutant allele frequency higher than 0.2 (Number of mutations/Total number of counts > 0.2).

This is the code I used:

./vep --cache --offline --symbol --coding_only \
--freq_freq 0.2 --freq_gt_lt gt --freq_filter include \ 
-i input.vcf -o output.txt

I checked the results by loading the bam files on IGV. However, I found that so far almost all the mutations in the results had allele frequency < 0.2. For example:

Total counts: 118
A: 0
C: 0
G: 102 (86%, 86+, 16-)
T: 16 (14%, 16+, 0-)
N: 0

The G -> T mutation has only 0.14.

Does anyone have experience in using VEP? The way I used it may be incorrect and could you point out what I am missing here? Thank you.

ensembl vep vcf perl • 1.8k views
ADD COMMENTlink modified 3.0 years ago by EnsemblWill560 • written 3.0 years ago by kin18210
gravatar for EnsemblWill
3.0 years ago by
United Kingdom
EnsemblWill560 wrote:

VEP cannot do the filtering on the data as you have it.

Typically frequency data is encoded in the INFO field of the VCF file, and VEP's accompanying filter script would allow you to filer on such a field. However, looking at the snippet you have pasted, VEP is unable to filter on this as it is not a standard format. Indeed, I'd be surprised if there was any software that could do this out of the box, except perhaps whatever was used to generate this VCF.

If I were doing this task I'd write a short perl script to process the data and filter.

ADD COMMENTlink written 3.0 years ago by EnsemblWill560
gravatar for cpad0112
3.0 years ago by
cpad011213k wrote:

AF is for filtering by Allele frequency: (copy/pasted from VEP manual here--

Note that for numeric fields, such as the *AF allele frequency fields, filter_vep does not consider the absence of a value for that field as equivalent to a 0 value. For example, if you wish to find rare variants by finding those where the allele frequency is less than 1% or absent, you should use the following:

--filter "AF < 0.01 or not AF"

Please post few lines of VCF here (with or without headers), that are not getting filtered with your VCF pipeline.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by cpad011213k

Here are a few lines of VCF:

1 69428 . T G . . . 0,2:2:2:0:2:0:2:35.5:2:0:0:0:34.5:12.5:0 1 69511 . A G . . . 0,2:2:2:0:2:0:2:37.5:0:0:2:0:40:2:0 1 183629 . G A . . . 14,6:20:6:13:6:14:20:37.5:36.8571:6:13:0:1:32.1667:28.6429:527.506:431.971:0

I wanted to filter the mutant allele frequency based on the data that I have (in-house frequency) (Number of counts that has that mutation is divided by total number of counts in bam file). Not to filter the allele frequency based on the data on 1000 Genome. I wonder if VEP can allow me to do this?

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by kin18210

VEP assumes standard VCF when filtering standard fields such as AF. Unless the source file has AF in standard format, it won't work.

ADD REPLYlink written 3.0 years ago by cpad011213k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1782 users visited in the last hour