VEP (SV annotation): how to choose the best overlapping reference SVs?
11 months ago
c.c.a. • 0

Hello biostars community,

I am using VEP (version 100) to annotate a VCF containing SVs with the AF values from a reference VCF. I am running VEP with the following parameters:

vep \
    --cache \
    --offline \
    --dir_cache ${cache} \
    --cache_version 98 \
    -i $INPUT \
    -o $OUTPUT \
    --$OFMT \
    --no_stats \
    --force_overwrite \
    --fork 4 \
    --compress_output bgzip \
    --regulatory \
    --symbol \
    --custom ${gnomadSVpath},gnomadSV01,vcf,exact,0,EUR_AF \
    --plugin StructuralVariantOverlap,file=$gnomadSVpath,cols=EUR_AF,match_type=surrounding,label=gnomadSV02

As for the reference SVs, I am using the gnomAD SV v2 (lifted over to hg38):

21      45802812        nssv15966780    A       <DUP>   .       .       DBVARID;SVTYPE=DUP;END=46109638;SVLEN=306827;EXPERIMENT=1;SAMPLESET=1;REGIONID=nsv4284975;AC=1;AFR_AC=1;AMR_AC=0;EAS_AC=0;EUR_AC=0;OTH_AC=0;AF=4.6e-05;AFR_AF=0.000105;AMR_AF=0;EAS_AF=0;EUR_AF=0;OTH_AF=0;AN=21694;AFR_AN=9534;AMR_AN=1930;EAS_AN=2416;EUR_AN=7624;OTH_AN=190
21      46008428        nssv15966790    C       <DUP>   .       .       DBVARID;SVTYPE=DUP;END=46181422;SVLEN=172995;EXPERIMENT=1;SAMPLESET=1;REGIONID=nsv4273892;AC=2;AFR_AC=1;AMR_AC=0;EAS_AC=0;EUR_AC=1;OTH_AC=0;AF=9.2e-05;AFR_AF=0.000105;AMR_AF=0;EAS_AF=0;EUR_AF=0.000131;OTH_AF=0;AN=21694;AFR_AN=9534;AMR_AN=1930;EAS_AN=2416;EUR_AN=7624;OTH_AN=190

After annotation I get the following results:

chr21   46059888        chr21_46059889_deletion N       <DEL>   .       .       END=46059946;CSQ=deletion|downstream_gene_variant|MODIFIER|AP001476.2|ENSG00000226115|Transcript|ENST00000435738|lncRNA|||||||||||2322|1||Clone_based_ensembl_gene||||||||0&0.000131|100&100|nssv15966780&nssv15966790||,deletion|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000665311|||||||||||||||||||||||0&0.000131|100&100|nssv15966780&nssv15966790||

where 0&0.000131 is/are the EUR_AF from nssv15966780&nssv15966790 It'll be a lot easier during the filtering step if only one EUR_AF value was assigned, for example when using filter_vep.

For instance, what if I get the two EUR_AF values, where, say, one is 1 and the other is 0.001, but I'd like only to keep the SV with EUR_AF < 0.1 in the filtering step. What is the general approach? Should I try to filter first the VCF containing the AF values and then annotate the query VCF, then filter with filter_vep, or just write a custom script and choose the SV if one of the AF-values matches the filtering criteria (EUR_AF < 0.1).

