Question

What Exactly Are The Ncbi Time Limits For Jobs Submitted From A Single Ip?

0

Entering edit mode

10.9 years ago

pld 5.1k

I have several instances of the same script I would like to run against NCBI. Given that NCBI requires you put a 3 second pause between each query, would it be acceptable to run multiple scripts at once? In other words, is the scope of the time limit on the script or globally (associated with a given IP address)?

python biopython ncbi • 4.3k views

ADD COMMENT • link updated 10.9 years ago by SES 8.6k • written 10.9 years ago by pld 5.1k

score 2 · Answer 1 · 2013-05-20

The time limits are there for a good reason.

If putting a 3 second delay between requests is a problem then it means that you want to potentially run way too many jobs - after all that adds up to almost thirty thousand jobs per day.

If that is not what you are aiming to do then just follow the recommendations, on the grand scheme of things it is a minor thing.

score 2 · Answer 2 · 2013-05-21

The NCBI usage guidelines depend on the service, for example:

NCBI BLAST web services: http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=DeveloperInfo
NCBI E-utilities: http://www.ncbi.nlm.nih.gov/books/NBK25497/#chapter2.Usage_Guidelines_and_Requiremen

I suggest you read the appropriate guidelines carefully, and consider using the batch submission options if they are suitable for your case, rather then attempting to run parallel jobs against the NCBI's services.

Blocking of a particular job is done based on the information available to NCBI, this typically means that the IP address seen by NCBI will be blocked. In cases where a registered application is being used, then only the specific tool may be blocked but that requires registering with NCBI.

As Istvan says, consider carefully what you are doing before running the risk of your access to NCBI being blocked.

You also may want to consider if your queries can be run using services provided by other service provides with usage restrictions that could be more appropriate to your workload. Checking out web services registries such as BioCatalogue might suggest alternatives. EMBL-EBI provide a handy list of web service registries you might find helpful: http://www.ebi.ac.uk/Tools/webservices/tutorials/05_registries.

score 1 · Answer 3 · 2013-05-21

It's not clear what NCBI server you are referring to in your post, but you can see on the NCBI E-utilities guidelines link provided by Hamish that a user may submit 3 URL requests per second (i.e., not 1 every 3 seconds). There are also some guidelines there for dealing with a large number of queries, such as using the Entrez History server and a sample script is provided in the E-utilities Sample Applications book. Also, as Hamish points out, there are likely multiple services which you can use to accomplish your task .

My solution to these problems is often to not use the web services at all if the job will take a long time. Depending on what you are trying to accomplish, it may be more sensible to just download the databases or flat files you are trying to query and run your jobs locally (in parallel). I have found this approach convenient because you are not subject to network/server down times. I'm sure BioPython's E-utils methods (like BioPerl's) allow you to specify a file instead of a database, so the script you are using will likely work with this approach. You can get access to many databases from the NCBI ftp site, but keep in mind that your local database may not be up to date the next time you need to use it.