VEP- What is the best idea to start analyzing?
Hi all,

I have not worked with VEP software yet. But I need some outputs of this software. Unfortunately, I did not understand how to do the analysis by reading the guide it. So, What is the best idea to start analyzing?

Why not install locally and try out examples?

I have installed it, But I do not know exactly what the first step is? I guess I should first annotate my VCF file using the script below? Is my guess right?

grep -v "#" data.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > data.gff.gz tabix -p gff data.gff.gz ./vep -i input.vcf -gff data.gff.gz -fasta genome.fa.gz  ADD REPLY 0 Entering edit mode what zx8754 said: did you only try "quick start" on the right of https://www.ensembl.org/info/docs/tools/vep/script/index.html ADD REPLY 0 Entering edit mode hi i have this problem after installing vep. i'm getting this errors on running vep. i'm helping mostafa by the way. Can't locate Try/Tiny.pm in @INC (@INC contains: /home/sadri/vep/ensembl-vep/modules /home/sadri/vep/ensembl-vep /opt/miRDeep2/mirdeep2/lib/perl5/x86_64-linux-thread-multi /opt/miRDeep2/mirdeep2/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /home/sadri/vep/ensembl-vep/Bio/EnsEMBL/Feature.pm line 85. BEGIN failed--compilation aborted at /home/sadri/vep/ensembl-vep/Bio/EnsEMBL/Feature.pm line 85. Compilation failed in require at /home/sadri/vep/ensembl-vep/Bio/EnsEMBL/Variation/BaseVariationFeature.pm line 58. BEGIN failed--compilation aborted at /home/sadri/vep/ensembl-vep/Bio/EnsEMBL/Variation/BaseVariationFeature.pm line 58. Compilation failed in require at /home/sadri/vep/ensembl-vep/Bio/EnsEMBL/Variation/VariationFeature.pm line 97. BEGIN failed--compilation aborted at /home/sadri/vep/ensembl-vep/Bio/EnsEMBL/Variation/VariationFeature.pm line 97. Compilation failed in require at /home/sadri/vep/ensembl-vep/Bio/EnsEMBL/Variation/DBSQL/VariationFeatureAdaptor.pm line 89. BEGIN failed--compilation aborted at /home/sadri/vep/ensembl-vep/Bio/EnsEMBL/Variation/DBSQL/VariationFeatureAdaptor.pm line 89. Compilation failed in require at /home/sadri/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm line 59. BEGIN failed--compilation aborted at /home/sadri/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseVEP.pm line 59. Compilation failed in require at (eval 7) line 3. ...propagated at /usr/share/perl5/base.pm line 94. BEGIN failed--compilation aborted at /home/sadri/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm line 56. Compilation failed in require at (eval 6) line 3. ...propagated at /usr/share/perl5/base.pm line 94. BEGIN failed--compilation aborted at /home/sadri/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm line 71. Compilation failed in require at ./vep line 20. BEGIN failed--compilation aborted at ./vep line 20.  ADD REPLY 0 Entering edit mode What command are you running when you get that error? ADD REPLY 0 Entering edit mode I am getting the same type of errors when I execute the ./vep --custom. I don't have root access unfortunately, so I'm having trouble fixing the error using sadri's solution. Can anyone help? ADD REPLY 0 Entering edit mode i found the solution yum install enablerepo=rpmforge perl-Try-Tiny  thanks ADD REPLY 0 Entering edit mode mostafa asks : is the input vcf file annotated? if it's annotated how should i annotate my file. thanks ADD REPLY 0 Entering edit mode The VCF should be a standard VCF. The VEP will only take into account the location and alleles. If you specify --vcf as output, you will retain everything that is already in your VCF, and the VEP will add its data into the INFO column. ADD REPLY 2 Entering edit mode 2.5 years ago The basic commands are in the documentation. ./vep --cache -i input.txt -o output.txt  Is it working when you run that with the example files that ship with the VEP? ADD COMMENT 0 Entering edit mode Unfortunately, no? The error is related to cache files. Another question, My Organism is Buffalo and there is no information in the cache folder for it? Can i use other organisms as file caches? ADD REPLY 0 Entering edit mode When you run the command with the example files, what is your error? There is no buffalo genome in Ensembl, so you will need to work with your own data. But we should fix the installation before we worry about that. ADD REPLY 0 Entering edit mode Hi Emily, Unfortunately, I've been involved with VEP for days. you asked me if VEP works for me correctly or not? I think i installed it correctly. Please see below: Is the installation done correctly? ADD REPLY 1 Entering edit mode It's impossible to read what's on the console. Can you please copy-paste the text and not a screenshot of the console? ADD REPLY 0 Entering edit mode Yes, Sure which: no tabix in (/opt/vep/ensembl-vep:/opth/hadoop/hadoop-2.7.3/bin:/opth/hadoop/hadoop-2.7.3/sbin:/opt/Mathematica/11.0/SystemFiles/Libraries/Linux-x86-64/:/opt/Mathematica/11.0/Executables:/opt/intel/composer_xe_2015.0.090/bin/intel64:/opt/torque/bin:/opt/torque/sbin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/torque/sbin/:/opt/torque/bin/:/opt/maui/bin/:/opt/maui/sbin/:/opt/gold/bin:/opt/torque/sbin/:/opt/torque/bin/:/opt/mireap/viennarna/share/perl5/:/opt/maui/bin/:/opt/maui/sbin/:/opt/boost/boost-installed:/opt/MATLAB/MATLAB_Production_Server/R2013a/toolbox/distcomp/bin/:/opt/cuda/bin:/home/m.rafiepour222/bin) #----------------------------------# # ENSEMBL VARIANT EFFECT PREDICTOR # #----------------------------------# Versions: ensembl : 94.5c08d90 ensembl-funcgen : 94.08b0c13 ensembl-io : 94.8d53275 ensembl-variation : 94.066b102 ensembl-vep : 94.4 Help: dev@ensembl.org , helpdesk@ensembl.org Twitter: @ensembl http://www.ensembl.org/info/docs/tools/vep/script/index.html Usage: ./vep [--cache|--offline|--database] [arguments] Basic options ============= --help Display this message and quit -i | --input_file Input file -o | --output_file Output file --force_overwrite Force overwriting of output file --species [species] Species to use [default: "human"] --everything Shortcut switch to turn on commonly used options. See web documentation for details [default: off] --fork [num_forks] Use forking to improve script runtime For full option documentation see: http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html  ADD REPLY 0 Entering edit mode The error is on the first line: which: no tabix Install bgzip2 and try again? ADD REPLY 0 Entering edit mode Ok, Is it possible for you to send me the bgzip2 installation link? ADD REPLY 0 Entering edit mode I'm glad to see you've solved it. These are issues where you can show (and have shown) that you've invested your effort. Remember, asking for a download link is like using the forum as Google, which is not encouraged. ADD REPLY 0 Entering edit mode many thanks for your guide, As I said above, my Organism is Buffalo and there is no information in the cache folder for it in VEP. So, as regards that in VEP documents do not provide information on how to create a file cache. Now I want to know how to generate the file cache? ADD REPLY 0 Entering edit mode Emily is the better person to tackle that. Like she said, installation needed to be solved before the data cache could be addressed. I'd recommend opening a new question about getting VEP to work with the Buffalo genome. That way, this thread would be able installing VEP and all the information about the new genome would belong in that thread. Please accept Emily's answer to mark this thread as solved. Thank you! ADD REPLY 0 Entering edit mode You don't need to generate a cache, you can use it directly with a GFF or GTF file and a genome FASTA. If you're having trouble with that, I agree with Ram that you should open a new post, because I'm getting very confused reading through here what is done and what links to what. ADD REPLY 0 Entering edit mode Hi Emily, many thanks for reply, Yes, I have been involved with this challenge for days. First, do you suggest that I use this script: grep -v "#" data.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > data.gff.gz
tabix -p gff data.gff.gz
./vep -i input.vcf -gff data.gff.gz -fasta genome.fa.gz


And if i did not get a result, opening a new question about getting VEP to work with the Buffalo genome ??

Asking about using a script is not very useful since we have no idea if your data input is in the correct format as the data.gff file above.

You are going to need to do this step by step. Do just grep -v "#" data.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' and see what you get first. Does the output look reasonable/right. Then proceed to add one step at a time. It is indeed time to stop posting in this thread and ask a new question if you are not able to make any progress/run into new errors. ADD REPLY 0 Entering edit mode Ok, First, I used the script below and created the zip file without error. module load SAMTools-1.4.1 grep -v "#" GCA_003121395.1_ASM312139v1_genomic.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > data.gff.gz


And then, use tabix -p gff data.gff.gz

And then, i use:

 vep -i Final_Filter_GQ_KHUZ_MAZ_EAZ_GIL_WAZ.vcf -gff data.gff.gz -fasta GCA_003121395.1_ASM312139v1_genomic.fna


But, I encountered this error?

-------------------- EXCEPTION --------------------
MSG: ERROR: Cannot use format gff without Bio::DB::HTS::Tabix module installed

STACK Bio::EnsEMBL::VEP::AnnotationSource::File::new /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm:162
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
STACK Bio::EnsEMBL::VEP::Runner::init /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /opt/vep/ensembl-vep/vep:224
Date (localtime)    = Thu Oct 25 16:42:55 2018
Ensembl API version = 94
---------------------------------------------------

Looks like you need to install this module.

Hi genomax,

I have been able to fix the installation problem. i tried and i was able to run this script (vep -i Final.vcf -gff data.gff.gz -fasta genomic.fna) Which Emily had suggested to me. with a few WARNING But no Error:

(vep) [m.rafiepour222@abrii1 ~]$vep -i Final.vcf -gff data.gff.gz -fasta genomic.fna Possible precedence issue with control flow operator at /opt/anaconda2/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845. WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna27858, rna27857 WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna40648 WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna46030, rna46031 WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna47129, rna47130 WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna50084 WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna54313, rna54314 WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna60662 WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna63492, rna63491 WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna64693 WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna67395, rna67394 (vep) [m.rafiepour222@abrii1 ~]$


And that's part of my output:

#Uploaded_variation     Location        Allele  Gene    Feature Feature_type    Consequence     cDNA_position   CDS_position    Protein_position        Amino_acids     Codons  Existing_variation      Extra
CM009840.1_932_C/A      CM009840.1:932  A       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1096_A/T     CM009840.1:1096 T       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1107_A/G     CM009840.1:1107 G       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1177_C/G     CM009840.1:1177 G       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1276_C/T     CM009840.1:1276 T       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1295_G/A     CM009840.1:1295 A       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1471_C/A     CM009840.1:1471 A       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1518_A/G     CM009840.1:1518 G       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER


Did everything go well?

Hi genomax,

I am waiting for your response. i have another question, i want to see if VEP works correctly, how can I calculate SIFT. i think that there should be a column with the name of SIFT in my output, but as you see in the output, is not this?

0
Entering edit mode

