Question: NCBI E-utilities : HTTP POST not working for EPost
3.8 years ago
pawan.samdani0 wrote:

I am trying to download FASTA sequences from a list of protein GIs (~ 100000). I planned to use EPost using HTTP POST to first upload the list of GIs and then use EFetch to download the FASTA.

I am getting no response from the server (WebEnv and query_key are not generated) when I upload the list of GIs with EPost using HTTP POST.

The code that I am using for EPost is :

use LWP::Simple; 
use LWP::UserAgent; 

$base = ''; 
$url = $base . "epost.fcgi"; 

$url_params = "db=protein&id=830003112&id=830003110&id=830003108&id=830003106&id=830003104&id=830003102&id=830003100&id=830003098&id=830003096&id=830003094&id=830003092&id=830003090"; 

#create HTTP user agent 
$ua = new LWP::UserAgent; 

#create HTTP request object 
$req = new HTTP::Request POST => "$url"; 

#post the HTTP request 
$response = $ua->request($req); 
print $response->content; 

This prints nothing after the code is run. Ideally it should print the WebEnv and the query_key. The HTTP status is OK and the code is 200.

If I change the url_params and remove all GIs from it

$url_params = "db=protein";

I get the following output :

<?xml version="1.0"?>
<!DOCTYPE ePostResult PUBLIC "-//NLM//DTD ePostResult, 11 May 2002//EN" "">
    <ERROR>Empty ID list; Nothing to store</ERROR>

I have no idea what the problem is and why the server isnt generating WebEnv and query_key.

If anyone knows the solution please help me out.

epost perl ncbi e-utilities • 1.7k views
epost perl ncbi e-utilities
3.8 years ago
planet earth
piet1.6k wrote:

$url_params = "db=protein&id=830003112&id=830003110&id=830003108&id=830003106&id=830003104&id=830003102&id=830003100&id=830003098&id=830003096&id=830003094&id=830003092&id=830003090";

You have to pass several ID as a comma separated list

$url_params = "db=protein&id=830003112,830003110,830003108,830003106,830003104,830003102,830003100,830003098,830003096,830003094,830003092,830003090";

see also the example on the NCBI site:

see also the example on the NCBI site:

The comma separated list is for EPost using HTTP GET, which has a limit of 200 GIs per query. I want to query a million GIs and for that HTTP POST should be used. Please read :- (in required parameters - id).

In HTTP POST query each id is separated by '&id='. This is mentioned in the sample application 4 :-

pawan.samdani wrote:

There is no difference in the format of the data between a GET and a POST request. For some reasons the NCBI server expects a comma separated list for id, but fails if '&id=' is given multiple times. Maybe the NCBI server is not in accordance with the CGI standard.

I guess that the problem is with Perl. Please pass your comma-separated GI list as a plain string to $req instance. If you pass them as a list or an associative array/hash, Perl will presumably mangle them into multi '&id='. 

piet wrote:

You are right, NCBI does not accept '&id='. It worked as soon as I converted it to comma separated list.

It seems they have given incorrect code in their sample application 4 for HTTP POST.

Thanks a lot! 

pawan.samdani wrote:

I'd be curious to know if the system itself will allow you to pass 1 million GIs into a single entrez post request (once you figure out how to generate right format) - but let us know 

Istvan Albert wrote:

I have successfully passed 100,000 GIs with one POST request. Will try 1 million soon.

pawan.samdani wrote:

I'm kinda surprised to (still) see this working ... didn't NCBI switched to https protocol some time ago? I would have thought the http approach would have been phased out by now

lieven.sterck wrote:
