Question: VEP: filter_vep, custom VCF annotation (kaviar)
0
gravatar for c.clarido
11 months ago by
c.clarido60
Netherlands/Rotterdam/Leiden University (Applied Science)
c.clarido60 wrote:

Hello biostars community,

I have recently annotated a multi-sample VCF using VEP standalone with plugins and custom annotations. For custom annotations, I used kaviar, goNl and gnomAD.

After annotation, I wanted to filter rare damaging missense variants. At this moment I have the following filter:

filter_vep \
    --force_overwrite \
    --input_file $1 \
    --output_file $2 \
    --only_matched \
    --filter "gnomADG_AF_NFE < 0.01 and kaviar_AF < 0.01 and goNl_AF <  0.01" \
    --filter "Polyphen2_HDIV_pred match D and SIFT4G_pred match D" \
    --filter "CADD_phred > 20"  \
    --filter "phastCons100way_vertebrate >= 0.3 and phyloP100way_vertebrate >= 0"

For example I have a variant with the following annotation:

chr1    1500144 rs186758134     C       T       2077.27 .       AC=10;AF=0.179;AN=56;BaseQRankSum=3.76;ClippingRankSum=0;DB;DP=446;ExcessHet=2.2123;FS=4.535;InbreedingCoeff=0.0252;MLEAC=10;MLEAF=0.179;MQ=58.52;MQRankSum=1.14;QD=13.15;ReadPosRankSum=1.58;SOR=0.185;VariantCaller=Intersection;CSQ=T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000308647|protein_coding|||||||||||3942|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02,T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000472194|retained_intron|||||||||||2296|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02,T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000474481|retained_intron|||||||||||3943|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02,T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000485748|retained_intron|||||||||||3942|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02,T|upstream_gene_variant|MODIFIER|AL645728.2|ENSG00000284740|Transcript|ENST00000641679|transcribed_unprocessed_pseudogene|||||||||||3438|1||Clone_based_ensembl_gene||||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02,T|upstream_gene_variant|MODIFIER|AL645728.2|ENSG00000284740|Transcript|ENST00000641974|processed_transcript|||||||||||3106|1||Clone_based_ensembl_gene||||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

For clarity, there are multiple CSQ fields:

CSQ=T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000308647|protein_coding|||||||||||3942|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000472194|retained_intron|||||||||||2296|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000474481|retained_intron|||||||||||3943|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000485748|retained_intron|||||||||||3942|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

T|upstream_gene_variant|MODIFIER|AL645728.2|ENSG00000284740|Transcript|ENST00000641679|transcribed_unprocessed_pseudogene|||||||||||3438|1||Clone_based_ensembl_gene||||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

T|upstream_gene_variant|MODIFIER|AL645728.2|ENSG00000284740|Transcript|ENST00000641974|processed_transcript|||||||||||3106|1||Clone_based_ensembl_gene||||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

Take kaviar AF values for example: 0.0010615&0.0162256

I tried running filter_vep however I get this kind messages in the stdout:

Argument "0.0001896&0.8633710&0.0302146" isn't numeric in numeric lt (<) at cedrick/conda/envs/ensembl-vep/share/ensembl-vep-98.0-0/modules/Bio/EnsEMBL/VEP/FilterSet.pm line 679, <GEN0> line 2305.

How does filter_vep go about processing this? Does it skip it?

vep filter_vep kaviar • 458 views
ADD COMMENTlink modified 11 months ago by Emily_Ensembl21k • written 11 months ago by c.clarido60
2
gravatar for Emily_Ensembl
11 months ago by
Emily_Ensembl21k
EMBL-EBI
Emily_Ensembl21k wrote:

The filter script will just skip the variant and it won't end up in your filtered data, even if it ought to pass the filter. Your best option is to clean up your data to ensure that it's in a format that the filters can manage.

ADD COMMENTlink written 11 months ago by Emily_Ensembl21k

Hello Emily,

Thank you for the quick response. I used the --custom as follow: --custom ~{kaViarPath},kaviar,vcf,exact,0,AF wouldn't it have matched only the exact coordinates?

ADD REPLYlink modified 11 months ago • written 11 months ago by c.clarido60

The custom VCFs match by rsID, not by coordinates, but these should be distinct.

You should check what's in the kaviar VCF. If you're getting these fields with "&" in them, is that what's in kaviar?

ADD REPLYlink written 11 months ago by Emily_Ensembl21k

Hello, actually kaviar had variants with multiple allele

ADD REPLYlink written 11 months ago by c.clarido60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1160 users visited in the last hour