Question: Get clinVar info for SNP with VEP
gravatar for Eugene A
9 months ago by
Eugene A90
Eugene A90 wrote:

Hello everyone, I'm trying to annotate a vcf file with VEP and get the info from ClinVar db

The clinVar database has a field corresponding to ACMG (if I understand this correctly), for example mutation 140749365 in BRAF has a tag "Pathogenic" in ClinVar ([AlleleID])OR(29020[AlleleID])) ). I assume that this info could be integrated in the VEP output. Nevertheless, I'm getting the empty CLIN_SIG field, even using flag "--everything" (as well as flag --check_existing) for some test cases. I'm not sure what is the problem here and actually out of good guesses :(

the input line in the vcf file, which provides empty CLIN_SIG filed:

chr7    140749365       .       G       A       50      PASS

On the other hand, input line like this:

chr7    140753339       .       G       A       50      PASS

provides me with correctly filled CLIN_SIG field

Also other output fields of VEP are quite cryptic: for example PHENO field has a value of "1&1,A" which suppose to be linked to "Existing_variation" field ("CM092083&COSV56058494"), but I'm not sure how to interpret it (seems that it is key -> value cod[AlleleID])OR(29020[AlleleID]))e, but I do not know where it is described)

Hope for some ideas! Best wishes, Eugene

snp vep • 528 views
ADD COMMENTlink modified 9 months ago by Kevin Blighe71k • written 9 months ago by Eugene A90
gravatar for Collin
9 months ago by
United States
Collin880 wrote:

A lot of variants will likely produce an empty ClinVar field because ClinVar pathogenicity assertions are only available for a small number of variants. However, you could cross reference annotations with another variant annotator just to be sure. For example in OpenCRAVAT, you could submit directly to the webserver or run the command line tool.

ADD COMMENTlink written 9 months ago by Collin880

I do understand that most of variants does not have the clinical signifficants - the problem is, that both variants in my example have (according to info from ClinVar website). But I'm getting correct (again according to ClinVar website) annotation for only one of them. I initially thought that it might be due to outdated version of ClinVar I'm using with VEP, but crating a custom annotation with the latest ClinVar release did not change anyting (

ADD REPLYlink written 9 months ago by Eugene A90

At least in my hands your first variant is generating a synonymous variant (BRAF D638D), which is not the same thing as the listed pathogenic variant.

ADD REPLYlink written 9 months ago by Collin880

If you want to check the actual 'raw' data, it is located here:

ADD REPLYlink modified 9 months ago • written 9 months ago by Kevin Blighe71k

My bad - I was generating this toy example by hand from ClinVar and did not notice that braf is on the opposite strand, so nucleotide change in ClinVar have to be reversecomplement before puting in vcf :(

ADD REPLYlink written 9 months ago by Eugene A90

Hi, I'm investigating open cravat - thank you for pointing to it! Moreover, I probably will insert it in the pipeline I am building for the visualization/filtration. Also I've noticed some strange thing concerning coding/non-coding visualizations.

My test vcf currently contain only non-coding SNP according to VEP (also if I filter with CRAVAT based on "coding" field in "filter" tab I'll get 0/52 SNP.) enter image description here

BUT on the summary tab CRAVAT draws the following diagrams:

enter image description here

It seems for me that CRAVAT assigns "intergenic" SNP to "coding"?

Best, Eugene

ADD REPLYlink modified 9 months ago • written 9 months ago by Eugene A90

Hi, a lead architect of OpenCRAVAT here. It's a bug in "Coding vs Noncoding Summary" widget. Sequence ontology terms have been evolving and the widget did not catch up with the change. We'll fix it and publish a fixed version shortly.

ADD REPLYlink written 9 months ago by slcrick240

Hi, thanks for the clarification I thought that this is a bug and glad that the tools is improving and evolving, it's actually a really cool soft !

having a chance wanna ask about tool performance: I'm running tool on a cluster and I see that by default at the mapping step CRAVT is using all cores, all other steps seems to be running on one core. Is it possible to speed things up? Load everything in memory or something like it (I thing that the disk speed is a limit there?)?


ADD REPLYlink written 9 months ago by Eugene A90

Thanks. In OpenCRAVAT (OC) 1.8.0, if multiple annotators are requested to be run, they can be run on multiple cores, but still one annotator will be run on one core. And, other steps such as aggregator are run on one core. OC started as a single core program and we have been adding multicore support to more steps of it, so fully utilizing multicores in all of its steps is definitely the direction. By the way, the maximum number of cores to use can be set in OC's setting. See "number of concurrent annotations per job" in

Indeed, loading the annotation database into memory can improve annotation speed, since most annotators' speed is I/O-bound. Some annotators' database is small enough for loading into memory and some have too big databases. If you can let me know which annotators you are using, we can examine them.

ADD REPLYlink modified 9 months ago • written 9 months ago by slcrick240

My experience is similar in that disk speed can limit the speed of annotation. If you have a machine with a SSD you'll definitely get a speed boost. See the wiki for more detail: .

ADD REPLYlink written 9 months ago by Collin880

Please run oc module install wgcodingvsnoncodingsummary to install the fixed version of the widget and see if the problem is gone.

ADD REPLYlink written 9 months ago by slcrick240

Hi, I'd like to ask one more question concerning the Cravat system: I need to annotate my variants with exon number, in particular I'm interested if the given SNP happened in the first or the last exon of the transcript (currently I'm implementing InterVar code to work with openCravat output to get an ACMG annotation).

I can get the list of exons for all transcripts from ensemble biomart (, and then prepare the a separate annotator with exon structure for this, but it turns out that Cravat uses a bit outdated transcript versions (for example ENST00000379389.4 is no longer in a database or ENST00000379370 has version .6 in Cravat and .7 on the website).

It is most likely not a problem for my purpose in majority of cases, but I'm wondering: 1) Are there any ways to manually update cravat for the newer version of ensemble? (As far as I can guess some files in the module common or mapper, have to be updated?) 2) Are there any other ways to get an exon structure from the Cravat (may be it is already there, but I've missed it)

Best, Eugene

ADD REPLYlink written 7 months ago by Eugene A90

Cannot really delve into the details without having the data in front of me; however, there are many instances whereby a variant can be regarded as both intergenic and also 'coding'. Think of splice-isoforms, which can vary in length by megabases. Some isoforms even span multiple genes. The 'fluid' human genome is a microcosm of evolution in its own right - every eventuality exists.

ADD REPLYlink written 9 months ago by Kevin Blighe71k

I do understand how one SNP can be intergenic and coding, problem is in inconcistancy of open CRAVAT (may be I did not explain it clearly enough) here the filtering tab: "filter" tab

I really do not understand how could it be connected to previose diagram. The only option I see is a different meaning of word "coding" in "Summary" and "Filter" tab

ADD REPLYlink modified 9 months ago • written 9 months ago by Eugene A90

Probably a question for CRAVAT

ADD REPLYlink written 9 months ago by Kevin Blighe71k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1663 users visited in the last hour