Ensembl vep. How to filter population frequency less than 1%?
1
1
Entering edit mode
7 weeks ago
Gonzalo ▴ 10

Hi everyone, I have gotten many responses from the site even though I never asked a question, this is my first query. I am working with ensemble vep to annotate and filter a vcf file. With this script I can annotate and filter the variants:

vep --species homo_sapiens --assembly GRCh37 --offline --xref_refseq --failed 1 --check_existing --no_escape --filter_common --dir $HOME / .vep --fasta$ HOME / .vep / homo_sapiens / 102_GRCh37 / Homo_sapiens .GRCh37.dna.toplevel.fa.gz --vcf --input_file AML1.vcf -o AML1_vep.vcf


With this script, around 40 variants with population frequencies greater than 1% are eliminated, however there are 7 variants with frequencies greater than 1% that are not removed. I can see that these variants have no data in the AF column which is where the filter acted (filter_common). I tried with --filter "AFR_AF <0.01 or not AFR_AF" and removing --filter_common but got the same result; I got the same variants as with filter_common. Could someone tell me what I'm doing wrong? Thank you very much!

line Ensembl off vep • 1.7k views
5
Entering edit mode
7 weeks ago
Ben_Ensembl ★ 1.8k

Hi Gonzalo,

To use the VEP frequency filters for specific continental populations, such as the AFR population in your query, you need to use the --freq_pop [pop] flag as described on the following documentation: https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#filt

The filtering options here filter your results before they are written to your output file. Using VEP's filtering script, it is possible to filter your results after VEP has run. This way you can retain all of the results and run multiple filter sets on the same results to find different data of interest: https://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html

Best wishes

Ben

0
Entering edit mode

When I add --freq_pop to the vep script, I don't get any change.

vep --species homo_sapiens --assembly GRCh37 --offline  --filter_common --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --everything --freq_pop "AFR_AF < 0.01 or not AFR_AF" --pick_order canonical,tsl,biotype,rank,ccds,length --force_overwrite --dir $HOME/.vep --fasta$HOME/.vep/homo_sapiens/102_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz --input_file in.vcf --output_file out.vep.vcfenter


If I do the vep filter after vep I don't have any changes either

filter_vep -i in.vep.vcf --filter "AF < 0.01 o not AF"  -o filtered.vep.vcf


Try different options, with --filter and nothing.

0
Entering edit mode

No problem, Gonzalo. When using the --freq_pop filter in the VEP query itself, you need to use the --check_frequency flag as well as specifying the population and value that the frequency filter applies to e.g

--check_frequency --freq_pop 1KG_AFR --freq_freq 0.1

When using the Filter VEP script with a VCF file as the input, by default filter_vep expects to find VEP annotations encoded in the CSQ INFO key. You can use the --vcf_info_field to change the INFO key VEP expects to decode depending on your VCF input file.

0
Entering edit mode

Hi Ben,

Thank you very much for your comments, they were very helpful. I was able to filter most of the variants with population frequencies less than 1% with the addition of "--check_frequency --freq_pop 1KG_AMR --freq_freq 0.01", however a varinate was not removed (yellow color in the image). Change the frequency in --freq_freq 0.01 to 0.001 and add "--freq_gt_lt gt --freq_filter exclude" I could see that two variants with frequencies less than 0.001 were removed (green in the image ) but not this one. I can't figure out what the problem is.

This variant in question was removed using Filter vep as you suggested.

Best wishes

0
Entering edit mode

Hi Gonzalo,

No problem- very happy to help. I'm not sure why this variant is not being filtered using the flags you describe. Could you share the ID of the variant in question?

In any case, I'm glad that you were able to use the Filter VEP to achieve the filtering you needed.

0
Entering edit mode

Here I copy part of the row with the information.

The reference genome is GRCh37.

Best,

Hugo_Symbol  Chr  Start_Position     End_Position   Variant_Classification  Type   Ref_Allele  Tumor_seq     dbSNP_RS

TYK2         19   10491352           10491353   5'Flank                 INS -      G             rs71297581

0
Entering edit mode

Please use ADD REPLY when responding to existing comments/posts to keep threads logically organized.

0
Entering edit mode

Hi Gonzalo,

The variant you are trying to filter out has two alt alleles - one alleles has 1KG_AMR = 0.1427 while the other has no frequency from the 1000 Genomes project. This may be the reason for the behaviour of the filter flags.

Could you please share your complete VEP input, query and output for this variant?

0
Entering edit mode

Hi Ben,

thank you for the reply. You mean vcf files? Yes, of course.

In the link you will find the input.vcf file, the output_vep.vcf and also the output_vep_vcf.maf generated with the vcf2maf scripts.

vcf files

Best wishes

0
Entering edit mode

Hi Gonzalo,

Thanks for providing the input data. In our hands, using the filtering flags in the VEP query itself also produces the same behaviour that you have observed. We will look into this, although we're not sure what is causing this to happen.

As I've said previously (and as you've done), we always advise people to use the Filter VEP script for filtering. I believe the Filter VEP script has performed as expected for you.

0
Entering edit mode

Hi Ben,

Thank you very much for your valuable response and for your time. It is good to know that it is not a bug in my script. As you say with Filter Vep it is possible to filter and customize the analysis, but I was interested in having the statistics that come out of Ensembl Vep (for example, how many variants were removed), that's why my interest in doing it that way.

I have also tried using Ensembl Vep with the result coming out of the Filter Vep; in this way it is possible to obtain the statistics, predictions and graphs provided by Vep. It's great!

Best wishes,