Question: Using UniProt's Retrieve/ID mapping service programmatically to retrieve Taxonomy from ACC_ID
0
gravatar for ladypurrsia
2.3 years ago by
ladypurrsia40
ladypurrsia40 wrote:

I have completed a blastx run on my samples and have obtained the following result (example):

$head blastx_result.txt

NS500162:172:HG5CJBGXX:1:11101:2522 ZWIP2_ARATH 52.500 40 19 0 2 121 25 64 8.26e-07 44.3

I would like to take the ACC_ID number, in this case, ZWIP2_ARATH and find the taxonomic information for this. I have many thousands of such ACC_ID numbers to map to taxonomy. This webite: Programmatic access - Mapping database identifiers explains, with example code, on how to use the Retrieve/ID mapping programmatically.

Here, you put in your ACC_ID number, Under 'Select options' From: UniProtKB AC/ID to: UniProtKB and like magic it gives a file that you can download with columns such as: EntryName, ProteinName, Organism, etc.

I am trying to obtain the exact same result, but programmatically (command line) due to my large numbers of ACC_IDs (well above the website limit, even if I split my ACC_IDs to many small files - I'll have hundreds of files to put in manually, which isn't practical).

Here is my script:

use strict;
use warnings 'all';

use LWP::UserAgent;

my @files = glob 'ACC_*'; # Files containg list of UniProt IDs.

my $base    = 'http://www.uniprot.org';
my $tool    = 'uploadlists';
my $contact = ''; # Please set your email address here
              # to help us debug in case of problems.

my $agent = LWP::UserAgent->new(agent => "libwww-perl $contact");
push @{$agent->requests_redirectable}, 'POST';

for my $file ( @files ) {

my $response = $agent->post(
    "$base/$tool/",
    Content_Type => 'form-data',
    Content      => [
        file     => [ $file ],
        format   => 'tab',
        from     => 'ACC+ID',
        to       => 'ID',
        columns  => 'entryname,proteinname,genename,organism',
     ],
 );

 while ( my $wait = $response->header('Retry-After') ) {
    print STDERR "Waiting ($wait)...\n";
    sleep $wait;
    $response = $agent->get($response->base);
 }

 if ( $response->is_success ) {
    print $response->content;
 }
 else {
die sprintf "Failed. Got %s for %s\n",
        $response->request->uri,
        $response->status_line;
 }
}

According to the link above (Mapping db identifiers), the code for ACC_ID is ACC+ID and that is what is put in as the 'from" field in the above code however; the 'to' field in the Retreive/ID Mapping website is UniProtKB, which there isn't an appropriate code for in the Mapping db identifiers link. So, I put in 'ID' and all this code does is return me two columns and both have my ACC_IDs side by side.

Any suggestions or clues on what I may be missing to get this to return actual taxonomy information? I'm hoping my final result would look something like this:

Entry    Entry name    Protein names        Gene names            Organism
Q9SVY1   ZWIP2_ARATH   Zinc finger protein  WIP-domain protein 2  Arabidopsis thaliana

Thank you TONS!

Added Comment: I have written a similar question here: Retrieving Taxonomy from Uniprot/Swissprot ACC_ID From Blastx Results and this inquiry has as it's answer how I can retrieve taxonomic information for one ACC_ID at a time.

uniprot blastx taxonomy • 1.6k views
ADD COMMENTlink modified 2.3 years ago by Elisabeth Gasteiger1.6k • written 2.3 years ago by ladypurrsia40

please validate (green mark on the left) or comment your previous question: Retrieving Taxonomy from Uniprot/Swissprot ACC_ID From Blastx Results

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum123k

Pierre: My apologies. I have just added the link to that particular question to this question.

ADD REPLYlink written 2.3 years ago by ladypurrsia40
0
gravatar for Elisabeth Gasteiger
2.3 years ago by
Geneva
Elisabeth Gasteiger1.6k wrote:

UniProtKB column names for programmatic access are documented here: http://www.uniprot.org/help/uniprotkb_column_names

You can also use the UniProtweb interface to build a query interactively, use the "Columns" button to customize your table layout, and then click on the "Share" button to see the URL this generates, with the relevant column names. You can then use this URL/the column names in your code.

http://www.uniprot.org/help/customize

http://insideuniprot.blogspot.ch/2015_03_01_archive.html

ADD COMMENTlink written 2.3 years ago by Elisabeth Gasteiger1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 904 users visited in the last hour