Question: LD calculation using LD plugin for VEP
0
gravatar for eXpander
2.2 years ago by
eXpander100
Sweden
eXpander100 wrote:

Hello,

I want to annotate with SNP-proxys (r^2>0.8) for each variant in my VCF-file. For that, I am trying to use the plugin LD (https://github.com/Ensembl/VEP_plugins/blob/release/88/LD.pm) for VEP.

When running the VEP script (version 87) like this:

perl variant_effect_predictor.pl --species homo_sapiens --input_file input.vcf.gz --output_file output.vcf --cache --plugin LD,1000GENOMES:phase_3:EUR,0.8 --port 3337 --buffer_size 100000 --chr 1-22,X,Y --no_consequences

I can see following warning message in the output:

Warning: a connection to the database is required to calculate LD

Do i need to download a database prior running the script with this plugin cause I cannot see any annotations, and I have approximately 300 000 variants in my VCF, none of those were annotated with any SNP-proxys. For the VEP I use assembly 87_GRCh37 and cache file homo_sapiens_vep_87_GRCh37.tar.gz.

Best,

Andrei

ld vep linkage disequilibrium • 1.0k views
ADD COMMENTlink modified 2.1 years ago by Ben_Ensembl970 • written 2.2 years ago by eXpander100
2
gravatar for Ben_Ensembl
2.2 years ago by
Ben_Ensembl970
EMBL-EBI
Ben_Ensembl970 wrote:

Hello Andrei,

We have updated the LD plugin, which should resolve the error you are encountering. Please get the latest version of the plugin from Github: https://github.com/Ensembl/VEP_plugins/blob/release/88/LD.pm

Also, please be aware that you should use the specific 1000 Genomes populations (e.g GBR), not the 'Super-populations' (e.g EUR), as the large dataset sizes will cause the plugin to fail.

Best wishes

Ben Ensembl Helpdesk

ADD COMMENTlink written 2.2 years ago by Ben_Ensembl970

Hi, the warning is gone but still no annotations are added. I checked by query manually on Ensembl and there are a lot of variants in my VCF that should have SNP Proxys. So, using the LD plugin does not work as it should. I changed to subpopulation CEU.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by eXpander100

By removing --no-consequences I move forwrd, but now with following errors:

WARNING: Plugin 'LD' went wrong: 
-------------------- EXCEPTION --------------------
MSG: Could not get adaptor VCFCollection for homo_sapiens variation

STACK Bio::EnsEMBL::DBSQL::DBAdaptor::AUTOLOAD /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/DBSQL/DBAdaptor.pm:996
STACK Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::_fetch_by_Slice_VCF /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:529
STACK Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::fetch_by_Slice /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:190
STACK Bio::EnsEMBL::Variation::DBSQL::LDFeatureContainerAdaptor::fetch_by_VariationFeature /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/DBSQL/LDFeatureContainerAdaptor.pm:317
STACK LD::run /home/alexsson/.vep/Plugins/LD.pm:165
STACK (eval) /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:2514
STACK Bio::EnsEMBL::Variation::Utils::VEP::run_plugins /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:2513
STACK Bio::EnsEMBL::Variation::Utils::VEP::vfoa_to_line /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:2591
STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_to_consequences /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:2239
STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_list_to_cons /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1682
STACK Bio::EnsEMBL::Variation::Utils::VEP::get_all_consequences /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1371
STACK main::main /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/variant_effect_predictor.pl:322
STACK toplevel /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/variant_effect_predictor.pl:149
Date (localtime)    = Fri Mar 31 13:14:59 2017
Ensembl API version = 87

?

ADD REPLYlink written 2.2 years ago by eXpander100

Hi Andrei,

We think this might be because the HTS Library is missing.

Try git clone --branch 1.3.2 --depth 1 https://github.com/samtools/htslib.git cd htslib make

Set path HTSLIB_DIR=PATH_TO/htslib

Hope this helps

Best wishes

Ben Ensembl Helpdesk

ADD REPLYlink written 2.2 years ago by Ben_Ensembl970

It looks like I have missed a lot of dependencies and stuff to make this work. I followed the link at http://dec2016.archive.ensembl.org/info/docs/api/api_installation.html.

Now something happens, but very slowly.

ADD REPLYlink written 2.2 years ago by eXpander100

Hi Andrei,

LD computation is very time-consuming. We have pushed further speed improvements to https://github.com/Ensembl/VEP_plugins/tree/release/88

We advise that you should update the plugin and run again.

We also recommend to use the new vep code for general speed improvements of the variant annotation: https://github.com/Ensembl/ensembl-vep

Best wishes

Ben Ensembl Helpdesk

ADD REPLYlink written 2.2 years ago by Ben_Ensembl970

Thank you. It works now, but slow.

ADD REPLYlink written 2.2 years ago by eXpander100

Hi again, Is it possible to modify LD-plugin so one can run it offline? I sometimes encounter problems with the connection. I have downloaded all necessary files from ftp://ftp.ensembl.org/pub/grch37/release-82/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/.

It would be nice to run it offline.

The error I get is following:

 WARNING: Plugin 'LD' went wrong: 
-------------------- EXCEPTION --------------------
MSG: Could not connect to database homo_sapiens_variation_87_37 as user anonymous using [DBI:mysql:database=homo_sapiens_variation_87_37;host=ensembldb.ensembl.org;port=3337] as a locator:
DBI connect('database=homo_sapiens_variation_87_37;host=ensembldb.ensembl.org;port=3337','anonymous',...) failed: Can't connect to MySQL server on 'ensembldb.ensembl.org' (110 "Connection timed out") at /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/DBSQL/DBConnection.pm line 260.

STACK Bio::EnsEMBL::DBSQL::DBConnection::connect /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/DBSQL/DBConnection.pm:276
STACK Bio::EnsEMBL::DBSQL::DBConnection::db_handle /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/DBSQL/DBConnection.pm:673
STACK Bio::EnsEMBL::DBSQL::DBConnection::prepare /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/DBSQL/DBConnection.pm:701
STACK Bio::EnsEMBL::DBSQL::BaseAdaptor::generic_fetch /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/DBSQL/BaseAdaptor.pm:475
STACK Bio::EnsEMBL::Variation::DBSQL::VariationAdaptor::fetch_by_name /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/DBSQL/VariationAdaptor.pm:526
STACK LD::run /home/alexsson/.vep/Plugins/LD.pm:153
STACK (eval) /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:2514
STACK Bio::EnsEMBL::Variation::Utils::VEP::run_plugins /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:2513
STACK Bio::EnsEMBL::Variation::Utils::VEP::vfoa_to_line /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:2591
STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_to_consequences /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:2239
STACK Bio::EnsEMBL::Variation::Utils::VEP::vf_list_to_cons /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1682
STACK Bio::EnsEMBL::Variation::Utils::VEP::get_all_consequences /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/Bio/EnsEMBL/Variation/Utils/VEP.pm:1371
STACK main::main /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/variant_effect_predictor.pl:295
STACK toplevel /home/alexsson/Programs/ensembl-tools-release-87/scripts/variant_effect_predictor/variant_effect_predictor.pl:149
Date (localtime)    = Tue Apr  4 10:55:37 2017
Ensembl API version = 87
---------------------------------------------------
[============================================================================================================================================================================================================>                                                                                                    ]   [ 67% ][kftp_connect_file] 350 Restart position accepted (0).
[E::hts_open_format] fail to open file 'ftp://ftp.ensembl.org/pub/grch37/release-82/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/ALL.chr1.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.vcf.gz'
[E::hts_open_format] fail to open file 'ftp://ftp.ensembl.org/pub/grch37/release-82/variation/vcf/homo_sapiens/1000GENOMES-phase_3-genotypes/ALL.chr1.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.vcf.gz'
ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by eXpander100

Hi Andrei,

It is not currently possible to run LD computation offline but it is something we can consider for future developments. As an alternative you could use the API for LD computation and compute LD in a region instead of doing a computation for each variant individually. http://www.ensembl.org/info/docs/api/variation/variation_tutorial.html#ld, Or using our REST API: https://rest.ensembl.org/documentation/info/ld_region_get

Best wishes

Ben Ensembl Helpdesk

ADD REPLYlink written 2.1 years ago by Ben_Ensembl970

Thanks for the reply.

What happens when LD-plugin encountering connection problems like in my situation, does LD-plugin trying to reconnect or it moves forward to the next variant in VCF?

Also, when encountering the "[E::hts_open_format] fail to ope file...", does it retry until it can open the file?

In simple terms, does LD-plugin trying to reconnect, re-open or it just moves to the next variant in VCF-file when encountering problems of this kind? I don't want to miss LD-annotations for half of the VCF-file because of those errors.

Best, Andrei

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by eXpander100

Dear Andrei,

If the LD-plugin encounters a connection problem, the VEP won’t report LD results for the variant and move on to the next variant and start LD computation again.

The plugin hasn’t been frequently used and we can go back and revise the implementation. There might be a way of computing LD without database connections. This will take some time but we can let you know when we've made some progress.

In the meantime, you can still make use of our perl API and REST API for LD computation.

Best wishes

Ben Ensembl Helpdesk

ADD REPLYlink written 2.1 years ago by Ben_Ensembl970

That's a pity.

I don't have the time schedule to learn how to use the API, so I guess I have to move on with other stuff and wait for your to revise the implementation. My purpose was to annotate all variants in our VCF-file (approximately 300 000 variants) with SNP-proxys (r^2>0.8).

Do you know other tools which can do this (without calculation is fine, just annotating with pre-calculated SNP-proxys based on a population will work just fine) without any much effort?

Best, Andrei

ADD REPLYlink written 2.1 years ago by eXpander100

Hi Andrei,

We would recommend https://www.cog-genomics.org/plink/2.0/

Best wishes

Ben Ensembl Helpdesk

ADD REPLYlink written 2.1 years ago by Ben_Ensembl970
0
gravatar for Ben_Ensembl
2.1 years ago by
Ben_Ensembl970
EMBL-EBI
Ben_Ensembl970 wrote:

Hi Andrei,

I'm just looking into this with my colleagues at the moment. I'll get back to you as soon as possible.

Best wishes

Ben

ADD COMMENTlink written 2.1 years ago by Ben_Ensembl970

Hi,

Is it possible to add "distance" as an argument? What is the default distance for the LD calculation now, 500 kb?

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by eXpander100

Hi Andrei,

No, it's not possible to add 'distance' as an argument. For LD calculations, the default distance is 100Kb upstream and downstream of the focus variant.

Best wishes

Ben

ADD REPLYlink written 2.0 years ago by Ben_Ensembl970
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1486 users visited in the last hour