Question

How to specify columns in Uniprot batch query from Uniprot IDs?

1

Entering edit mode

6.1 years ago

Solowars ▴ 70

Dear all,

My question is relatively similar to How To Programmatically Retrieve A Batch Of Fasta Sequences From For A List Of Uniprot Accession Ids?, in which someone tries to retrieve fasta sequences from Uniprot from a list of Uniprot Protein IDs.

In my case, I also have a list of Uniprot IDs, and I tried the perl script of the example (see link above). Being a complete perl layman, I managed to retrieve the main result table in Excel. However, I would like to add some more columns (i.e. sequences), something possible to do in the web service. Is there an easy way to modify that perl code example in order to specify the columns I want?

A related question: Is it possible to retrieve the same information but using R instead? I found a couple Uniprot-related packages in Bioconductor, but by reading the manuals I don't think they can do the trick...

Thank you very much!

uniprot perl batch proteinID • 2.4k views

ADD COMMENT • link updated 6.1 years ago by Elisabeth Gasteiger ★ 2.4k • written 6.1 years ago by Solowars ▴ 70

1

Entering edit mode

in java : use the XML schema for uniprot to generate code. see How To Retrieve Human Proteins Sequence Containing A Given Domain ; Finding Single Domain Proteins

ADD REPLY • link 6.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Hi Pierre! I'll give it a try, even though I'm afraid that my knowledge of Java is even more limited than Perl... Thanks!

ADD REPLY • link 6.1 years ago by Solowars ▴ 70

score 2 · Answer 1 · 2018-03-19

You can try something like this below - and note that the column names are documented at https://www.uniprot.org/help/uniprotkb_column_names :

use strict;
use warnings;
use LWP::UserAgent;

my $list = $ARGV[0]; # File containg list of UniProt identifiers.

my $base = 'http://www.uniprot.org';
my $tool = 'uploadlists';

my $contact = ''; # Please set your email address here to help us debug in case of problems.
my $agent = LWP::UserAgent->new(agent => "libwww-perl $contact");
push @{$agent->requests_redirectable}, 'POST';

my $response = $agent->post("$base/$tool/",
   [ 'file' => [$list],
     'format' => 'tab',
     'columns'=> 'id,protein_names,genes,length',
     'from' => 'ACC+ID',
     'to' => 'ACC',
   ],
   'Content_Type' => 'form-data');

while (my $wait = $response->header('Retry-After')) {
    print STDERR "Waiting ($wait)...\n";
    sleep $wait;
    $response = $agent->get($response->base);
}