Problem With Ensembl Variant Effect Predictor Stand Alone Perl Tool
4
1
Entering edit mode
13.0 years ago
Nasir ▴ 20

Hi All

I would be grateful for your help with this problem.

I am annotating SNPs in vcf files from 1000 genomes project using the Ensembl Variant Effect Predictor stand alone perl tool varianteffectpredictor.pl. Sometimes I am getting the correct output file, but sometimes I am having the following problems: (1) It is taking a long time to generate each output file, (2) Sometimes not all variants are being annotated; some SNPs are missed out in the output file, and (3) Sometimes I am getting no output file at all, but get the following error

$ perl varianteffectpredictor.pl -i ABCA12.vcf -format vcf -hgnc -sift b -polyphen b -condel b -o ABCA12phase.vep

Could not connect to database homosapienscore6237g as user anonymous using [DBI:mysql:database=homosapienscore6237g;host=ensembldb.ensembl.org;port=5306] as a locator: Lost connection to MySQL server at 'reading initial communication packet', system error: 0 at /usr/local/lib/perl/5.10.1/Bio/EnsEMBL/DBSQL/DBConnection.pm line 290, <GEN0> line 186.

-------------------- EXCEPTION -------------------- MSG: Could not connect to database homosapienscore6237g as user anonymous using [DBI:mysql:database=homosapienscore6237g;host=ensembldb.ensembl.org;port=5306] as a locator: Lost connection to MySQL server at 'reading initial communication packet', system error: 0 STACK Bio::EnsEMBL::DBSQL::DBConnection::connect /usr/local/lib/perl/5.10.1/Bio/EnsEMBL/DBSQL/DBConnection.pm:299 STACK Bio::EnsEMBL::DBSQL::DBConnection::dbhandle /usr/local/lib/perl/5.10.1/Bio/EnsEMBL/DBSQL/DBConnection.pm:618 STACK Bio::EnsEMBL::DBSQL::DBConnection::prepare /usr/local/lib/perl/5.10.1/Bio/EnsEMBL/DBSQL/DBConnection.pm:647 STACK Bio::EnsEMBL::DBSQL::BaseAdaptor::genericfetch /usr/local/lib/perl/5.10.1/Bio/EnsEMBL/DBSQL/BaseAdaptor.pm:509 STACK Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::slicefetch /usr/local/lib/perl/5.10.1/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm:495 STACK Bio::EnsEMBL::DBSQL::BaseFeatureAdaptor::fetchallbySliceconstraint /usr/local/lib/perl/5.10.1/Bio/EnsEMBL/DBSQL/BaseFeatureAdaptor.pm:316 STACK Bio::EnsEMBL::DBSQL::TranscriptAdaptor::fetchallbySlice /usr/local/lib/perl/5.10.1/Bio/EnsEMBL/DBSQL/TranscriptAdaptor.pm:372 STACK Bio::EnsEMBL::Slice::getallTranscripts /usr/local/lib/perl/5.10.1/Bio/EnsEMBL/Slice.pm:2398 STACK Bio::EnsEMBL::Variation::VariationFeature::getallTranscriptVariations /usr/local/share/perl/5.10.1/Bio/EnsEMBL/Variation/VariationFeature.pm:382 STACK main::printconsequences varianteffectpredictor.pl:233 STACK main::main varianteffectpredictor.pl:205 STACK toplevel varianteffectpredictor.pl:44 Ensembl API version = 62

I am not able to decipher this error message & would be grateful for suggestions about how to deal with the above problems.

ensembl variant • 7.9k views
ADD COMMENT
0
Entering edit mode

are you working behind a firewall ?

ADD REPLY
0
Entering edit mode

In case you are interested, I've already annotated that using my own tool (SnpEff: http://snpeff.sourceforge.net/). The process takes 20 minutes or so.

Here are the results: http://www.mcb.mcgill.ca/~pcingola/1k_genomes/1000_Genomes_snpEff.txt.gz

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you Pablo. In fact, I moved to using your very useful tool (snpEff) because of the slow progress I was making with using variant effect predictor.

ADD REPLY
0
Entering edit mode

Great, send me an email if you have any questions.

ADD REPLY
4
Entering edit mode
13.0 years ago
Drio ▴ 920
  1. You are querying a remote server. There is a lot of overload involved on each of the queries performed against the server. It is slow. You have different options:

a. Replicate the ensembl the mysql database locally and query to your local server.

b. SPlit your snp list and query the server in parallel. You will increase the load on the server though. It has been working pretty well for me as I am not seeing performance penalties by using this approach.

c. Try another tools that download ensembl databases locally and build a datastructure in memory. Annovar is an option. It can annotate Millions of snps in less than 1 hour on a regular machine.

  1. Did you notice there is a limit on the number of snps you can send to ensembl? The limit is 1000 snps. Could that be what is causing the problem?

  2. From the error message it seems the socket that links your local machine with the server is broken. Next time that happens try to use the mysql client see if you get some extra information in the error message that may help you troubleshoot the problem.

ADD COMMENT
0
Entering edit mode

I've not used the new version of the snp effect predictor. How can you run it locally if it performs polyphen and sift? Are the polyphen and sift analyses on the variants now done at the time of database creation and stored in the ensembl db?

ADD REPLY
0
Entering edit mode

Many thanks Drio. Can anyone please point me in the general direction of how to replicate the ensembl mysql database locally?

ADD REPLY
4
Entering edit mode
13.0 years ago
Fiona ▴ 70

Hello,

I just wanted to add that the maximum of 1000 variant restriction is only for the online version not the downloaded script version.

If you have a very large amount of data, you can also try running the script in whole-genome mode - please refer to the README file that comes with the script for guidance before doing this. ftp://ftp.ensembl.org/pub/misc-scripts/Variant_effect_predictor_2.0/

ADD COMMENT
3
Entering edit mode
13.0 years ago
Willm ▴ 30

Hello,

1) As Fiona stated, you can try using whole-genome mode (add the -w flag to your command line). You should ensure that the data you have is suitable - the file should be ordered by chromosome and position, and ideally should represent a contiguous region (e.g. a gene, set of genes or a whole chromosome). You can refer to the README for more information about this.

2) If SNPs are missing from the output it means that they do not overlap or fall near any Ensembl-annotated transcripts - you can consider them to be intergenic with no predicted consequence.

3) As drio stated, you are querying a remote database, so connection issues can and will occasionally occur. To eliminate these, consider setting up a local copy of the human core Ensembl database.

ADD COMMENT
0
Entering edit mode

Hello,

My dataset contains 5 SNVs but only 3 of them have been annotated with the variant effect predictor tool. One of these two unannotated positions is a known SNP but the script, ran with default parameteres, does not provide any result. Here is one line:

20      57206550        .       G       A       30.88   PASS    AC=2;AF=1.00;AN=2;DP=3;Dels=0.00;HRun=3;HaplotypeScore=0.0000;MQ=35.51;MQ0=0;QD=10.29;SB=-0.01;sumGLbyD=21.03   GT:AD:DP:GQ:PL
    1/1:0,3:2:6.01:62,6,0

Is it possible to obtain at least the dbSNP id for this type of variants using Ensembl APIs?

Best regards,
S.

ADD REPLY
0
Entering edit mode
13.0 years ago
Pi ▴ 520

is there an error in the parseline function

pileup: chr1 60 T A

    if(
       ($config->{input_format} =~ /pileup/i) ||
       (
            $data[0] =~ /(chr)?\w+/ &&
            $data[1] =~ /\d+/ &&
            $data[2] =~ /^[ACGTN-]+$/ &&
            $data[3] =~ /^[ACGTNRYSWKM*+\/-]+$/
        )
    ) {
        my @return = ();

        if($data[2] ne "*"){
            my $var;

            if($data[**2**] =~ /^[A|C|G|T]$/) {
                         $var = $data[**2**];
            }
            else {
                ($var = unambiguity_code($data[3])) =~ s/$data[2]//ig

;

Shouldn't this be data[3] which contains the alternate allele (genotype)

ADD COMMENT
0
Entering edit mode

Thankyou pi, this indeed should be $data[3].

I will patch in a fix to the code on the Ensembl CVS server.

ADD REPLY

Login before adding your answer.

Traffic: 2048 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6