Hello,
I want to annotate with SNP-proxys (r^2>0.8) for each variant in my VCF-file. For that, I am trying to use the plugin LD (https://github.com/Ensembl/VEP_plugins/blob/release/88/LD.pm) for VEP.
When running the VEP script (version 87) like this:
perl variant_effect_predictor.pl --species homo_sapiens --input_file input.vcf.gz --output_file output.vcf --cache --plugin LD,1000GENOMES:phase_3:EUR,0.8 --port 3337 --buffer_size 100000 --chr 1-22,X,Y --no_consequences
I can see following warning message in the output:
Warning: a connection to the database is required to calculate LD
Do i need to download a database prior running the script with this plugin cause I cannot see any annotations, and I have approximately 300 000 variants in my VCF, none of those were annotated with any SNP-proxys. For the VEP I use assembly 87_GRCh37 and cache file homo_sapiens_vep_87_GRCh37.tar.gz.
Best,
Andrei
Hi, the warning is gone but still no annotations are added. I checked by query manually on Ensembl and there are a lot of variants in my VCF that should have SNP Proxys. So, using the LD plugin does not work as it should. I changed to subpopulation CEU.
By removing --no-consequences I move forwrd, but now with following errors:
?
Hi Andrei,
We think this might be because the HTS Library is missing.
Try git clone --branch 1.3.2 --depth 1 https://github.com/samtools/htslib.git cd htslib make
Set path HTSLIB_DIR=PATH_TO/htslib
Hope this helps
Best wishes
Ben Ensembl Helpdesk
It looks like I have missed a lot of dependencies and stuff to make this work. I followed the link at http://dec2016.archive.ensembl.org/info/docs/api/api_installation.html.
Now something happens, but very slowly.
Hi Andrei,
LD computation is very time-consuming. We have pushed further speed improvements to https://github.com/Ensembl/VEP_plugins/tree/release/88
We advise that you should update the plugin and run again.
We also recommend to use the new vep code for general speed improvements of the variant annotation: https://github.com/Ensembl/ensembl-vep
Best wishes
Ben Ensembl Helpdesk
Thank you. It works now, but slow.
Hi again, Is it possible to modify LD-plugin so one can run it offline? I sometimes encounter problems with the connection. I have downloaded all necessary files from ftp://ftp.ensembl.org/pub/grch37/release-82/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/.
It would be nice to run it offline.
The error I get is following:
Hi Andrei,
It is not currently possible to run LD computation offline but it is something we can consider for future developments. As an alternative you could use the API for LD computation and compute LD in a region instead of doing a computation for each variant individually. http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#ld, Or using our REST API: https://rest.ensembl.org/documentation/info/ld_region_get
Best wishes
Ben Ensembl Helpdesk
Thanks for the reply.
What happens when LD-plugin encountering connection problems like in my situation, does LD-plugin trying to reconnect or it moves forward to the next variant in VCF?
Also, when encountering the "[E::hts_open_format] fail to ope file...", does it retry until it can open the file?
In simple terms, does LD-plugin trying to reconnect, re-open or it just moves to the next variant in VCF-file when encountering problems of this kind? I don't want to miss LD-annotations for half of the VCF-file because of those errors.
Best, Andrei
Dear Andrei,
If the LD-plugin encounters a connection problem, the VEP won’t report LD results for the variant and move on to the next variant and start LD computation again.
The plugin hasn’t been frequently used and we can go back and revise the implementation. There might be a way of computing LD without database connections. This will take some time but we can let you know when we've made some progress.
In the meantime, you can still make use of our perl API and REST API for LD computation.
Best wishes
Ben Ensembl Helpdesk
That's a pity.
I don't have the time schedule to learn how to use the API, so I guess I have to move on with other stuff and wait for your to revise the implementation. My purpose was to annotate all variants in our VCF-file (approximately 300 000 variants) with SNP-proxys (r^2>0.8).
Do you know other tools which can do this (without calculation is fine, just annotating with pre-calculated SNP-proxys based on a population will work just fine) without any much effort?
Best, Andrei
Hi Andrei,
We would recommend https://www.cog-genomics.org/plink/2.0/
Best wishes
Ben Ensembl Helpdesk