How to specify columns in Uniprot batch query from Uniprot IDs?
1
1
Entering edit mode
6.1 years ago
Solowars ▴ 70

Dear all,

My question is relatively similar to How To Programmatically Retrieve A Batch Of Fasta Sequences From For A List Of Uniprot Accession Ids?, in which someone tries to retrieve fasta sequences from Uniprot from a list of Uniprot Protein IDs.

In my case, I also have a list of Uniprot IDs, and I tried the perl script of the example (see link above). Being a complete perl layman, I managed to retrieve the main result table in Excel. However, I would like to add some more columns (i.e. sequences), something possible to do in the web service. Is there an easy way to modify that perl code example in order to specify the columns I want?

A related question: Is it possible to retrieve the same information but using R instead? I found a couple Uniprot-related packages in Bioconductor, but by reading the manuals I don't think they can do the trick...

Thank you very much!

uniprot perl batch proteinID • 2.3k views
ADD COMMENT
1
Entering edit mode

in java : use the XML schema for uniprot to generate code. see How To Retrieve Human Proteins Sequence Containing A Given Domain ; Finding Single Domain Proteins

ADD REPLY
0
Entering edit mode

Hi Pierre! I'll give it a try, even though I'm afraid that my knowledge of Java is even more limited than Perl... Thanks!

ADD REPLY
2
Entering edit mode
6.1 years ago

You can try something like this below - and note that the column names are documented at https://www.uniprot.org/help/uniprotkb_column_names :

use strict;
use warnings;
use LWP::UserAgent;

my $list = $ARGV[0]; # File containg list of UniProt identifiers.

my $base = 'http://www.uniprot.org';
my $tool = 'uploadlists';

my $contact = ''; # Please set your email address here to help us debug in case of problems.
my $agent = LWP::UserAgent->new(agent => "libwww-perl $contact");
push @{$agent->requests_redirectable}, 'POST';

my $response = $agent->post("$base/$tool/",
   [ 'file' => [$list],
     'format' => 'tab',
     'columns'=> 'id,protein_names,genes,length',
     'from' => 'ACC+ID',
     'to' => 'ACC',
   ],
   'Content_Type' => 'form-data');

while (my $wait = $response->header('Retry-After')) {
    print STDERR "Waiting ($wait)...\n";
    sleep $wait;
    $response = $agent->get($response->base);
}
ADD COMMENT
0
Entering edit mode

It worked! For some reason it wasn't working at first, so I tried to add the example chunk:

    $response->is_success ?
  print $response->content :
  die 'Failed, got ' . $response->status_line .
    ' for ' . $response->request->uri . "\n";

and it worked like a charm!

ADD REPLY

Login before adding your answer.

Traffic: 2724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6