Hello biostars community,
I am using VEP (version 100) to annotate a VCF containing SVs with the AF values from a reference VCF. I am running VEP with the following parameters:
vep \
--cache \
--offline \
--dir_cache ${cache} \
--cache_version 98 \
-i $INPUT \
-o $OUTPUT \
--$OFMT \
--no_stats \
--force_overwrite \
--fork 4 \
--compress_output bgzip \
--regulatory \
--symbol \
--custom ${gnomadSVpath},gnomadSV01,vcf,exact,0,EUR_AF \
--plugin StructuralVariantOverlap,file=$gnomadSVpath,cols=EUR_AF,match_type=surrounding,label=gnomadSV02
As for the reference SVs, I am using the gnomAD SV v2 (lifted over to hg38):
21 45802812 nssv15966780 A <DUP> . . DBVARID;SVTYPE=DUP;END=46109638;SVLEN=306827;EXPERIMENT=1;SAMPLESET=1;REGIONID=nsv4284975;AC=1;AFR_AC=1;AMR_AC=0;EAS_AC=0;EUR_AC=0;OTH_AC=0;AF=4.6e-05;AFR_AF=0.000105;AMR_AF=0;EAS_AF=0;EUR_AF=0;OTH_AF=0;AN=21694;AFR_AN=9534;AMR_AN=1930;EAS_AN=2416;EUR_AN=7624;OTH_AN=190
21 46008428 nssv15966790 C <DUP> . . DBVARID;SVTYPE=DUP;END=46181422;SVLEN=172995;EXPERIMENT=1;SAMPLESET=1;REGIONID=nsv4273892;AC=2;AFR_AC=1;AMR_AC=0;EAS_AC=0;EUR_AC=1;OTH_AC=0;AF=9.2e-05;AFR_AF=0.000105;AMR_AF=0;EAS_AF=0;EUR_AF=0.000131;OTH_AF=0;AN=21694;AFR_AN=9534;AMR_AN=1930;EAS_AN=2416;EUR_AN=7624;OTH_AN=190
After annotation I get the following results:
chr21 46059888 chr21_46059889_deletion N <DEL> . . END=46059946;CSQ=deletion|downstream_gene_variant|MODIFIER|AP001476.2|ENSG00000226115|Transcript|ENST00000435738|lncRNA|||||||||||2322|1||Clone_based_ensembl_gene||||||||0&0.000131|100&100|nssv15966780&nssv15966790||,deletion|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000665311|||||||||||||||||||||||0&0.000131|100&100|nssv15966780&nssv15966790||
where 0&0.000131
is/are the EUR_AF
from nssv15966780&nssv15966790
It'll be a lot easier during the filtering step if only one EUR_AF value was assigned, for example when using filter_vep
.
For instance, what if I get the two EUR_AF values, where, say, one is 1 and the other is 0.001, but I'd like only to keep the SV with EUR_AF < 0.1 in the filtering step. What is the general approach? Should I try to filter first the VCF containing the AF values and then annotate the query VCF, then filter with filter_vep, or just write a custom script and choose the SV if one of the AF-values matches the filtering criteria (EUR_AF < 0.1).