What Exactly Are The Ncbi Time Limits For Jobs Submitted From A Single Ip?
3
0
Entering edit mode
10.9 years ago
pld 5.1k

I have several instances of the same script I would like to run against NCBI. Given that NCBI requires you put a 3 second pause between each query, would it be acceptable to run multiple scripts at once? In other words, is the scope of the time limit on the script or globally (associated with a given IP address)?

python biopython ncbi • 4.3k views
ADD COMMENT
2
Entering edit mode
10.9 years ago

The time limits are there for a good reason.

If putting a 3 second delay between requests is a problem then it means that you want to potentially run way too many jobs - after all that adds up to almost thirty thousand jobs per day.

If that is not what you are aiming to do then just follow the recommendations, on the grand scheme of things it is a minor thing.

ADD COMMENT
2
Entering edit mode
10.9 years ago
Hamish ★ 3.2k

The NCBI usage guidelines depend on the service, for example:

I suggest you read the appropriate guidelines carefully, and consider using the batch submission options if they are suitable for your case, rather then attempting to run parallel jobs against the NCBI's services.

Blocking of a particular job is done based on the information available to NCBI, this typically means that the IP address seen by NCBI will be blocked. In cases where a registered application is being used, then only the specific tool may be blocked but that requires registering with NCBI.

As Istvan says, consider carefully what you are doing before running the risk of your access to NCBI being blocked.

You also may want to consider if your queries can be run using services provided by other service provides with usage restrictions that could be more appropriate to your workload. Checking out web services registries such as BioCatalogue might suggest alternatives. EMBL-EBI provide a handy list of web service registries you might find helpful: http://www.ebi.ac.uk/Tools/webservices/tutorials/05_registries.

ADD COMMENT
1
Entering edit mode
10.9 years ago
SES 8.6k

It's not clear what NCBI server you are referring to in your post, but you can see on the NCBI E-utilities guidelines link provided by Hamish that a user may submit 3 URL requests per second (i.e., not 1 every 3 seconds). There are also some guidelines there for dealing with a large number of queries, such as using the Entrez History server and a sample script is provided in the E-utilities Sample Applications book. Also, as Hamish points out, there are likely multiple services which you can use to accomplish your task .

My solution to these problems is often to not use the web services at all if the job will take a long time. Depending on what you are trying to accomplish, it may be more sensible to just download the databases or flat files you are trying to query and run your jobs locally (in parallel). I have found this approach convenient because you are not subject to network/server down times. I'm sure BioPython's E-utils methods (like BioPerl's) allow you to specify a file instead of a database, so the script you are using will likely work with this approach. You can get access to many databases from the NCBI ftp site, but keep in mind that your local database may not be up to date the next time you need to use it.

ADD COMMENT

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6