Hello everyone, I'm trying to annotate a vcf file with VEP and get the info from ClinVar db
The clinVar database has a field corresponding to ACMG (if I understand this correctly), for example mutation 140749365 in BRAF has a tag "Pathogenic" in ClinVar ( https://www.ncbi.nlm.nih.gov/clinvar?term=((174176[AlleleID])OR(29020[AlleleID])) ). I assume that this info could be integrated in the VEP output. Nevertheless, I'm getting the empty CLIN_SIG field, even using flag "--everything" (as well as flag --check_existing) for some test cases. I'm not sure what is the problem here and actually out of good guesses :(
the input line in the vcf file, which provides empty CLIN_SIG filed:
chr7 140749365 . G A 50 PASS
On the other hand, input line like this:
chr7 140753339 . G A 50 PASS
provides me with correctly filled CLIN_SIG field
Also other output fields of VEP are quite cryptic: for example PHENO field has a value of "1&1,A" which suppose to be linked to "Existing_variation" field ("CM092083&COSV56058494"), but I'm not sure how to interpret it (seems that it is key -> value codhttps://www.ncbi.nlm.nih.gov/clinvar?term=((174176[AlleleID])OR(29020[AlleleID]))e, but I do not know where it is described)
Hope for some ideas! Best wishes, Eugene
I do understand that most of variants does not have the clinical signifficants - the problem is, that both variants in my example have (according to info from ClinVar website). But I'm getting correct (again according to ClinVar website) annotation for only one of them. I initially thought that it might be due to outdated version of ClinVar I'm using with VEP, but crating a custom annotation with the latest ClinVar release did not change anyting (
At least in my hands your first variant is generating a synonymous variant (BRAF D638D), which is not the same thing as the listed pathogenic variant.
If you want to check the actual 'raw' data, it is located here: ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/
My bad - I was generating this toy example by hand from ClinVar and did not notice that braf is on the opposite strand, so nucleotide change in ClinVar have to be reversecomplement before puting in vcf :(
Hi, I'm investigating open cravat - thank you for pointing to it! Moreover, I probably will insert it in the pipeline I am building for the visualization/filtration. Also I've noticed some strange thing concerning coding/non-coding visualizations.
My test vcf currently contain only non-coding SNP according to VEP (also if I filter with CRAVAT based on "coding" field in "filter" tab I'll get 0/52 SNP.)
BUT on the summary tab CRAVAT draws the following diagrams:
It seems for me that CRAVAT assigns "intergenic" SNP to "coding"?
Hi, a lead architect of OpenCRAVAT here. It's a bug in "Coding vs Noncoding Summary" widget. Sequence ontology terms have been evolving and the widget did not catch up with the change. We'll fix it and publish a fixed version shortly.
Hi, thanks for the clarification I thought that this is a bug and glad that the tools is improving and evolving, it's actually a really cool soft !
having a chance wanna ask about tool performance: I'm running tool on a cluster and I see that by default at the mapping step CRAVT is using all cores, all other steps seems to be running on one core. Is it possible to speed things up? Load everything in memory or something like it (I thing that the disk speed is a limit there?)?
Thanks. In OpenCRAVAT (OC) 1.8.0, if multiple annotators are requested to be run, they can be run on multiple cores, but still one annotator will be run on one core. And, other steps such as aggregator are run on one core. OC started as a single core program and we have been adding multicore support to more steps of it, so fully utilizing multicores in all of its steps is definitely the direction. By the way, the maximum number of cores to use can be set in OC's setting. See "number of concurrent annotations per job" in https://github.com/KarchinLab/open-cravat/wiki/5.-GUI-usage#system-setting.
Indeed, loading the annotation database into memory can improve annotation speed, since most annotators' speed is I/O-bound. Some annotators' database is small enough for loading into memory and some have too big databases. If you can let me know which annotators you are using, we can examine them.
My experience is similar in that disk speed can limit the speed of annotation. If you have a machine with a SSD you'll definitely get a speed boost. See the wiki for more detail: https://github.com/KarchinLab/open-cravat/wiki#system-capabilities .
oc module install wgcodingvsnoncodingsummaryto install the fixed version of the widget and see if the problem is gone.
Hi, I'd like to ask one more question concerning the Cravat system: I need to annotate my variants with exon number, in particular I'm interested if the given SNP happened in the first or the last exon of the transcript (currently I'm implementing InterVar code to work with openCravat output to get an ACMG annotation).
I can get the list of exons for all transcripts from ensemble biomart (https://www.ensembl.org/biomart/martview/), and then prepare the a separate annotator with exon structure for this, but it turns out that Cravat uses a bit outdated transcript versions (for example ENST00000379389.4 is no longer in a database or ENST00000379370 has version .6 in Cravat and .7 on the website).
It is most likely not a problem for my purpose in majority of cases, but I'm wondering: 1) Are there any ways to manually update cravat for the newer version of ensemble? (As far as I can guess some files in the module common or mapper, have to be updated?) 2) Are there any other ways to get an exon structure from the Cravat (may be it is already there, but I've missed it)
Cannot really delve into the details without having the data in front of me; however, there are many instances whereby a variant can be regarded as both intergenic and also 'coding'. Think of splice-isoforms, which can vary in length by megabases. Some isoforms even span multiple genes. The 'fluid' human genome is a microcosm of evolution in its own right - every eventuality exists.
I do understand how one SNP can be intergenic and coding, problem is in inconcistancy of open CRAVAT (may be I did not explain it clearly enough) here the filtering tab:
I really do not understand how could it be connected to previose diagram. The only option I see is a different meaning of word "coding" in "Summary" and "Filter" tab
Probably a question for CRAVAT