Ensembl vep. How to filter population frequency less than 1%?
1
1
Entering edit mode
2.5 years ago
Gonzalo ▴ 20

Hi everyone, I have gotten many responses from the site even though I never asked a question, this is my first query. I am working with ensemble vep to annotate and filter a vcf file. With this script I can annotate and filter the variants:

vep --species homo_sapiens --assembly GRCh37 --offline --xref_refseq --failed 1 --check_existing --no_escape --filter_common --dir $ HOME / .vep --fasta $ HOME / .vep / homo_sapiens / 102_GRCh37 / Homo_sapiens .GRCh37.dna.toplevel.fa.gz --vcf --input_file AML1.vcf -o AML1_vep.vcf

With this script, around 40 variants with population frequencies greater than 1% are eliminated, however there are 7 variants with frequencies greater than 1% that are not removed. I can see that these variants have no data in the AF column which is where the filter acted (filter_common). I tried with --filter "AFR_AF <0.01 or not AFR_AF" and removing --filter_common but got the same result; I got the same variants as with filter_common. Could someone tell me what I'm doing wrong? Thank you very much!

line Ensembl off vep • 5.2k views
ADD COMMENT
5
Entering edit mode
2.5 years ago
Ben_Ensembl ★ 2.4k

Hi Gonzalo,

To use the VEP frequency filters for specific continental populations, such as the AFR population in your query, you need to use the --freq_pop [pop] flag as described on the following documentation: https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#filt

The filtering options here filter your results before they are written to your output file. Using VEP's filtering script, it is possible to filter your results after VEP has run. This way you can retain all of the results and run multiple filter sets on the same results to find different data of interest: https://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html

Best wishes

Ben

ADD COMMENT
0
Entering edit mode

Hi Ben, Thank you very much for your answer!

When I add --freq_pop to the vep script, I don't get any change.

vep --species homo_sapiens --assembly GRCh37 --offline  --filter_common --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --everything --freq_pop "AFR_AF < 0.01 or not AFR_AF" --pick_order canonical,tsl,biotype,rank,ccds,length --force_overwrite --dir $HOME/.vep --fasta $HOME/.vep/homo_sapiens/102_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz --input_file in.vcf --output_file out.vep.vcfenter

If I do the vep filter after vep I don't have any changes either

filter_vep -i in.vep.vcf --filter "AF < 0.01 o not AF"  -o filtered.vep.vcf

Try different options, with --filter and nothing.

ADD REPLY
0
Entering edit mode

No problem, Gonzalo. When using the --freq_pop filter in the VEP query itself, you need to use the --check_frequency flag as well as specifying the population and value that the frequency filter applies to e.g

--check_frequency --freq_pop 1KG_AFR --freq_freq 0.1

When using the Filter VEP script with a VCF file as the input, by default filter_vep expects to find VEP annotations encoded in the CSQ INFO key. You can use the --vcf_info_field to change the INFO key VEP expects to decode depending on your VCF input file.

ADD REPLY
0
Entering edit mode

Hi Ben,

Thank you very much for your comments, they were very helpful. I was able to filter most of the variants with population frequencies less than 1% with the addition of "--check_frequency --freq_pop 1KG_AMR --freq_freq 0.01", however a varinate was not removed (yellow color in the image). Change the frequency in --freq_freq 0.01 to 0.001 and add "--freq_gt_lt gt --freq_filter exclude" I could see that two variants with frequencies less than 0.001 were removed (green in the image ) but not this one. I can't figure out what the problem is.

enter image description here

enter image description here

This variant in question was removed using Filter vep as you suggested.

Best wishes

ADD REPLY
0
Entering edit mode

Hi Gonzalo,

No problem- very happy to help. I'm not sure why this variant is not being filtered using the flags you describe. Could you share the ID of the variant in question?

In any case, I'm glad that you were able to use the Filter VEP to achieve the filtering you needed.

ADD REPLY
0
Entering edit mode

Hi Ben, thanks for your reply.

Here I copy part of the row with the information.

The reference genome is GRCh37.

Best,

Hugo_Symbol  Chr  Start_Position     End_Position   Variant_Classification  Type   Ref_Allele  Tumor_seq     dbSNP_RS

TYK2         19   10491352           10491353   5'Flank                 INS -      G             rs71297581
ADD REPLY
0
Entering edit mode

Please use ADD REPLY when responding to existing comments/posts to keep threads logically organized.

ADD REPLY
0
Entering edit mode

Hi Gonzalo,

The variant you are trying to filter out has two alt alleles - one alleles has 1KG_AMR = 0.1427 while the other has no frequency from the 1000 Genomes project. This may be the reason for the behaviour of the filter flags.

Could you please share your complete VEP input, query and output for this variant?

ADD REPLY
0
Entering edit mode

Hi Ben,

thank you for the reply. You mean vcf files? Yes, of course.

In the link you will find the input.vcf file, the output_vep.vcf and also the output_vep_vcf.maf generated with the vcf2maf scripts.

vcf files

Best wishes

ADD REPLY
0
Entering edit mode

Hi Gonzalo,

Thanks for providing the input data. In our hands, using the filtering flags in the VEP query itself also produces the same behaviour that you have observed. We will look into this, although we're not sure what is causing this to happen.

As I've said previously (and as you've done), we always advise people to use the Filter VEP script for filtering. I believe the Filter VEP script has performed as expected for you.

ADD REPLY
0
Entering edit mode

Hi Ben,

Thank you very much for your valuable response and for your time. It is good to know that it is not a bug in my script. As you say with Filter Vep it is possible to filter and customize the analysis, but I was interested in having the statistics that come out of Ensembl Vep (for example, how many variants were removed), that's why my interest in doing it that way.

I have also tried using Ensembl Vep with the result coming out of the Filter Vep; in this way it is possible to obtain the statistics, predictions and graphs provided by Vep. It's great!

Best wishes,

ADD REPLY
0
Entering edit mode

Hi Gonzalo,

Just to update you that we'll include a fix in Ensembl 106, due to be released in 2022, to correct the behaviour you observed when using the filters in the VEP query.

ADD REPLY
0
Entering edit mode

Great! thank you for the comment. Best wishes

ADD REPLY

Login before adding your answer.

Traffic: 2909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6