Question: NCBI E-utilities : HTTP POST not working for EPost
0
gravatar for pawan.samdani
4.3 years ago by
Australia
pawan.samdani0 wrote:

I am trying to download FASTA sequences from a list of protein GIs (~ 100000). I planned to use EPost using HTTP POST to first upload the list of GIs and then use EFetch to download the FASTA.

I am getting no response from the server (WebEnv and query_key are not generated) when I upload the list of GIs with EPost using HTTP POST.

The code that I am using for EPost is :

#!/usr/bin/perl 
use LWP::Simple; 
use LWP::UserAgent; 

$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/'; 
$url = $base . "epost.fcgi"; 

$url_params = "db=protein&id=830003112&id=830003110&id=830003108&id=830003106&id=830003104&id=830003102&id=830003100&id=830003098&id=830003096&id=830003094&id=830003092&id=830003090"; 

#create HTTP user agent 
$ua = new LWP::UserAgent; 

#create HTTP request object 
$req = new HTTP::Request POST => "$url"; 
$req->content_type('application/x-www-form-urlencoded'); 
$req->content("$url_params"); 

#post the HTTP request 
$response = $ua->request($req); 
print $response->content; 

This prints nothing after the code is run. Ideally it should print the WebEnv and the query_key. The HTTP status is OK and the code is 200.

If I change the url_params and remove all GIs from it

$url_params = "db=protein";

I get the following output :

<?xml version="1.0"?>
<!DOCTYPE ePostResult PUBLIC "-//NLM//DTD ePostResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ePost_020511.dtd">
<ePostResult>
    <ERROR>Empty ID list; Nothing to store</ERROR>
</ePostResult>

I have no idea what the problem is and why the server isnt generating WebEnv and query_key.

If anyone knows the solution please help me out.

epost perl ncbi e-utilities • 1.9k views
ADD COMMENTlink modified 18 months ago by payreqepostapp0 • written 4.3 years ago by pawan.samdani0
2
gravatar for piet
4.3 years ago by
piet1.7k
planet earth
piet1.7k wrote:

$url_params = "db=protein&id=830003112&id=830003110&id=830003108&id=830003106&id=830003104&id=830003102&id=830003100&id=830003098&id=830003096&id=830003094&id=830003092&id=830003090";

You have to pass several ID as a comma separated list

$url_params = "db=protein&id=830003112,830003110,830003108,830003106,830003104,830003102,830003100,830003098,830003096,830003094,830003092,830003090";

see also the example on the NCBI site: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EPost

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by piet1.7k

The comma separated list is for EPost using HTTP GET, which has a limit of 200 GIs per query. I want to query a million GIs and for that HTTP POST should be used. Please read :- http://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_EPost_ (in required parameters - id).

In HTTP POST query each id is separated by '&id='. This is mentioned in the sample application 4 :- http://www.ncbi.nlm.nih.gov/books/NBK25498/#_chapter3_Application_4_Finding_unique_se_

ADD REPLYlink written 4.3 years ago by pawan.samdani0
1

There is no difference in the format of the data between a GET and a POST request. For some reasons the NCBI server expects a comma separated list for id, but fails if '&id=' is given multiple times. Maybe the NCBI server is not in accordance with the CGI standard.

I guess that the problem is with Perl. Please pass your comma-separated GI list as a plain string to $req instance. If you pass them as a list or an associative array/hash, Perl will presumably mangle them into multi '&id='. 

ADD REPLYlink written 4.3 years ago by piet1.7k

You are right, NCBI does not accept '&id='. It worked as soon as I converted it to comma separated list.

It seems they have given incorrect code in their sample application 4 for HTTP POST.

Thanks a lot! 

ADD REPLYlink written 4.3 years ago by pawan.samdani0

I'd be curious to know if the system itself will allow you to pass 1 million GIs into a single entrez post request (once you figure out how to generate right format) - but let us know 

ADD REPLYlink written 4.3 years ago by Istvan Albert ♦♦ 81k

I have successfully passed 100,000 GIs with one POST request. Will try 1 million soon.

ADD REPLYlink written 4.3 years ago by pawan.samdani0

I'm kinda surprised to (still) see this working ... didn't NCBI switched to https protocol some time ago? I would have thought the http approach would have been phased out by now

ADD REPLYlink written 18 months ago by lieven.sterck6.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2389 users visited in the last hour