VEP fail to annotate NCBI refseq GFF
10 weeks ago
bharata1803 ▴ 560

Hi all,

I am using VEP docker and I have downloaded Ensembl homo_sapiens cache. Both cache and docker are VEP 110 (dcoker version 110.1 docker link)

To annotate refseq, I download their GFF and BAM file from this link and use this parameter:

--gff [path]/GCF_000001405.40_GRCh38.p14_genomic.sorted.gff.gz

--bam [path]/GCF_000001405.40_GRCh38.p14_knownrefseq_alns.bam

--fasta [path]/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

I also supply the synonym file from the cache folder. I can see form the output VCF file, the GFF file is being read form the header in output VCF file:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|CCDS|ENSP|GIVEN_REF|USED_REF|BAM_EDIT|SOURCE|SIFT|PolyPhen|DOMAINS|HGVS_OFFSET|HGVSg|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|TRANSCRIPTION_FACTORS|GCF_000001405.40_GRCh38.p14_genomic.sorted.gff.gz">

But I notice there are no annotation that use Refseq, such as NM_*. When I use docker ensembl version 109.3, it works fine and show Refseq annotation. I have tried to use docker version 111.0 but it is same as 110.1 that I don't have the Refseq annotation. What happened here? Is there any changes between 109.3 and 110.1 so that the GFF from Refseq is not usable?

Notes: It is a bug from VEP, has been confirmed in this issue: link

VEP variant • 235 views
You need to download the refseq cache and use the --refseq option when running VEP.


