Question: How to filter mutant allele frequency in Variant Effect Predictor (VEP)?
0
gravatar for kin182
23 months ago by
kin18210
kin18210 wrote:

This is the first time I used Variant Effect Predictor (VEP) and would like to use it to annotate the VCF files I got from WES data. I tried to set up some filters to include only the mutations with mutant allele frequency higher than 0.2 (Number of mutations/Total number of counts > 0.2).

This is the code I used:

./vep --cache --offline --symbol --coding_only \
--freq_freq 0.2 --freq_gt_lt gt --freq_filter include \ 
-i input.vcf -o output.txt

I checked the results by loading the bam files on IGV. However, I found that so far almost all the mutations in the results had allele frequency < 0.2. For example:

Total counts: 118
A: 0
C: 0
G: 102 (86%, 86+, 16-)
T: 16 (14%, 16+, 0-)
N: 0

The G -> T mutation has only 0.14.

Does anyone have experience in using VEP? The way I used it may be incorrect and could you point out what I am missing here? Thank you.

ensembl vep vcf perl • 1.3k views
ADD COMMENTlink modified 23 months ago by EnsemblWill560 • written 23 months ago by kin18210
3
gravatar for EnsemblWill
23 months ago by
EnsemblWill560
United Kingdom
EnsemblWill560 wrote:

VEP cannot do the filtering on the data as you have it.

Typically frequency data is encoded in the INFO field of the VCF file, and VEP's accompanying filter script would allow you to filer on such a field. However, looking at the snippet you have pasted, VEP is unable to filter on this as it is not a standard format. Indeed, I'd be surprised if there was any software that could do this out of the box, except perhaps whatever was used to generate this VCF.

If I were doing this task I'd write a short perl script to process the data and filter.

ADD COMMENTlink written 23 months ago by EnsemblWill560
1
gravatar for cpad0112
23 months ago by
cpad011211k
India
cpad011211k wrote:

AF is for filtering by Allele frequency: (copy/pasted from VEP manual here-- http://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html)

Note that for numeric fields, such as the *AF allele frequency fields, filter_vep does not consider the absence of a value for that field as equivalent to a 0 value. For example, if you wish to find rare variants by finding those where the allele frequency is less than 1% or absent, you should use the following:

--filter "AF < 0.01 or not AF"

Please post few lines of VCF here (with or without headers), that are not getting filtered with your VCF pipeline.

ADD COMMENTlink modified 23 months ago • written 23 months ago by cpad011211k

Here are a few lines of VCF:

1 69428 . T G . . . AD:DP:n.read.pos:n.read.pos.ref:raw.count:raw.count.ref:raw.count.total:mean.quality:count.plus:count.plus.ref:count.minus:count.minus.ref:read.pos.mean:read.pos.var:codon.dir 0,2:2:2:0:2:0:2:35.5:2:0:0:0:34.5:12.5:0 1 69511 . A G . . . AD:DP:n.read.pos:n.read.pos.ref:raw.count:raw.count.ref:raw.count.total:mean.quality:count.plus:count.plus.ref:count.minus:count.minus.ref:read.pos.mean:read.pos.var:codon.dir 0,2:2:2:0:2:0:2:37.5:0:0:2:0:40:2:0 1 183629 . G A . . . AD:DP:n.read.pos:n.read.pos.ref:raw.count:raw.count.ref:raw.count.total:mean.quality:mean.quality.ref:count.plus:count.plus.ref:count.minus:count.minus.ref:read.pos.mean:read.pos.mean.ref:read.pos.var:read.pos.var.ref:codon.dir 14,6:20:6:13:6:14:20:37.5:36.8571:6:13:0:1:32.1667:28.6429:527.506:431.971:0

I wanted to filter the mutant allele frequency based on the data that I have (in-house frequency) (Number of counts that has that mutation is divided by total number of counts in bam file). Not to filter the allele frequency based on the data on 1000 Genome. I wonder if VEP can allow me to do this?

ADD REPLYlink modified 23 months ago • written 23 months ago by kin18210

VEP assumes standard VCF when filtering standard fields such as AF. Unless the source file has AF in standard format, it won't work.

ADD REPLYlink written 23 months ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1980 users visited in the last hour