Question: How To Programmatically Retrieve A Batch Of Fasta Sequences From For A List Of Uniprot Accession Ids?
gravatar for Superbest
6.8 years ago by
United States
Superbest120 wrote:

I have a list of UniProtKB Protein Accessions. I want to fetch the peptide sequence corresponding to each accession in fasta format. Preferably, all the fasta sequences should be combined into a single file.

I would like to do this programmatically, with C#.

UniProt's FAQ explains how to do it. However, the code example they give is Perl:

use strict;
use warnings;
use LWP::UserAgent;

my $list = $ARGV[0]; # File containg list of UniProt identifiers.

my $base = '';
my $tool = 'batch';

my $contact = ''; # Please set your email address here to help us debug in case of problems.
my $agent = LWP::UserAgent->new(agent => "libwww-perl $contact");
push @{$agent->requests_redirectable}, 'POST';

my $response = $agent->post("$base/$tool/",
                            [ 'file' => [$list],
                              'format' => 'txt',
                            'Content_Type' => 'form-data');

while (my $wait = $response->header('Retry-After')) {
  print STDERR "Waiting ($wait)...\n";
  sleep $wait;
  $response = $agent->get($response->base);

$response->is_success ?
  print $response->content :
  die 'Failed, got ' . $response->status_line .
    ' for ' . $response->request->uri . "\n";

I don't know Perl (and would prefer not having to learn it). I can see that it submits the required identifier with an HTTP POST, but I can't figure out what the HTTP POST is supposed to contain (it would be really helpful if I had URL that I could paste into my browser to manually download the file). I tried using Fiddler to intercept the request, but Strawberry Perl manages to circumvent Fiddler.

How would I accomplish the same thing this code is doing in C#? What is the correct way to submit the web request?

perl webservice uniprot • 8.6k views
ADD COMMENTlink modified 3.3 years ago by Natasha40 • written 6.8 years ago by Superbest120

how about using ?

ADD REPLYlink written 6.8 years ago by Pierre Lindenbaum131k

I wish to do it programmatically, not manually.

ADD REPLYlink written 6.8 years ago by Superbest120

You can do using one liner given by UCSC_UTILITIES. Once you download this use faSomeRecords

faSomeRecords in.fa listFile out.fa
ADD REPLYlink written 3.3 years ago by Chirag Parsania1.9k
gravatar for bilouorama
6.8 years ago by
bilouorama0 wrote:

An other way is simply to get each web page with libcurl or another library. It's slow because you ask each file independently.

You can also get pages through a wget or curl using the batch retreive service :

With this command I get all the sequences in fasta format :-)

I don't know if a C# library exist like libcurl for C. But I hope so.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by bilouorama0

This code, taken from the "Mapping database identifiers" section, maps identifiers from one database to another. I want to download the fasta sequences corresponding to the identifiers.

ADD REPLYlink written 6.8 years ago by Superbest120

You're right... I confused... I change my answer.

ADD REPLYlink written 6.8 years ago by bilouorama0
gravatar for Kenosis
6.8 years ago by
Kenosis1.2k wrote:

Perhaps the following will be helpful:

use strict;
use warnings;
use LWP::Simple;

@ARGV == 2 or die "Usage: perl $0 accessions.txt outFile.fasta\n";

my $dashes  = '-' x 25 . "\n";
my $outFile = pop;
my %list    = map { chomp; $_ => 1 } <>;

my $accessions = join '+OR+', keys %list;
my $fastaRecs =
  get "$accessions&force=yes&format=fasta"
  or die "No records returned.\n";

open my $fh, '>', $outFile or die $!;

for ( split />/, $fastaRecs ) {
    if ( /\|(.+?)\|/ and exists $list{$1} ) {
        print $fh ">$_";
        delete $list{$1};

close $fh;

print $dashes;

if ( !keys %list ) {
    print "All accessions retrieved.\n";
else {
    print "Accessions not retrieved:\n";
    print "$_\n" for keys %list;

print $dashes;

Usage: perl accessions.txt outFile.fasta

The script builds an "OR" query based upon the accessions, checks that only those requested are finally printed to a file, and tracks those retrieved. The results are printed, and show whether all accessions have been retrieved or the missing accessions.

For example, when accessions.txt contained:


the output file contains:

>sp|P12345|AATM_RABIT Aspartate aminotransferase, mitochondrial OS=Oryctolagus cuniculus GN=GOT2 PE=1 SV=2
>sp|P69905|HBA_HUMAN Hemoglobin subunit alpha OS=Homo sapiens GN=HBA1 PE=1 SV=2
>tr|A2BC19|A2BC19_HELPX GTPase (Fragment) OS=Helicobacter pylori GN=yphC PE=4 SV=1

Although you may also just enjoy the "Retrieve" tab at

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Kenosis1.2k

As I posted in the original question, I already have a Perl script to accomplish this task. I am trying to understand how to do it in other languages, such as C#.

ADD REPLYlink written 6.8 years ago by Superbest120
gravatar for Natasha
3.3 years ago by
Natasha40 wrote:

Hello Everyone!

I would like to convert the python script in this link to do the same task which the perl script posted here does. i.e I have a txt file with UNIPROT ID'S mentioned.I wish to obtain the protein sequence in FASTA file using a python script. Following this task,I would be using the sequence to calculate it's molecular weight.I have the script for calculating molecular weight in python.

Could someone help me with changing the PERL script in python?

Thanks a lot for your time and kind consideration.


ADD COMMENTlink written 3.3 years ago by Natasha40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1753 users visited in the last hour