Why multiple SYMBOLS, Consequences... for Variant Effect Predictor (VEP)
8 weeks ago
gernophil ▴ 10

Hey everyone,

I have a question about the VEP results. Why are there for some variants multiple features like consequence, gene symbol, ensemble id...? And why does it get more, if I have more samples?

Shouldn't the gene be specified by the position on the genome?

An example is this (around 20 samples, after bcftools +vep-split):

CHROM   POS REF ALT ID  Consequence SYMBOL  Existing_variation  VARIANT_CLASS   Gene
17  81645307    G   A   .   intron_variant&non_coding_transcript_variant,non_coding_transcript_exon_variant,missense_variant,missense_variant,missense_variant&NMD_transcript_variant,regulatory_region_variant NPLOC4,TSPAN10,TSPAN10,TSPAN10,TSPAN10,.    rs6565617,rs6565617,rs6565617,rs6565617,rs6565617,rs6565617 SNV,SNV,SNV,SNV,SNV,SNV ENSG00000182446,ENSG00000182612,ENSG00000182612,ENSG00000182612,ENSG00000182612,.

The same variant with around 500 samples (including the above 17):

CHROM   POS REF ALT ID  Consequence SYMBOL  Existing_variation  VARIANT_CLASS   Gene
17  81645307    G   A   .   intron_variant&non_coding_transcript_variant,intron_variant&non_coding_transcript_variant,non_coding_transcript_exon_variant,non_coding_transcript_exon_variant,missense_variant,missense_variant,missense_variant,missense_variant,missense_variant&NMD_transcript_variant,missense_variant&NMD_transcript_variant,regulatory_region_variant,regulatory_region_variant NPLOC4,NPLOC4,TSPAN10,TSPAN10,TSPAN10,TSPAN10,TSPAN10,TSPAN10,TSPAN10,TSPAN10,.,.   rs6565617,rs6565617,rs6565617,rs6565617,rs6565617,rs6565617,rs6565617,rs6565617,rs6565617,rs6565617,rs6565617,rs6565617 SNV,SNV,SNV,SNV,SNV,SNV,SNV,SNV,SNV,SNV,SNV,SNV ENSG00000182446,ENSG00000182446,ENSG00000182612,ENSG00000182612,ENSG00000182612,ENSG00000182612,ENSG00000182612,ENSG00000182612,ENSG00000182612,ENSG00000182612,.,.

One explanation for multiple entries that I could think of could be that a variant can sit in a coding region for one gene and in a regulatory region for another. However, this does not explain, why there's a different amount at different n. Can someone explain this to me? My VCF are called with Haplotypecaller per sample and then merged and the merged VCF is then annotated.

VEP • 283 views
8 weeks ago
barslmn ★ 1.2k

Ensembl VEP annotates for every allele, gene and transcript. You can flag or pick alleles or transcripts with pick options.


If you add pick flags you can explode this line with -d option of bcftools split-vep, you can select later the annotations you're interested in with bcftools expressions like -i 'PICK~"1"'


