Why blast doesn't cache previous queries?
0
1
Entering edit mode
8.4 years ago
roy.granit ▴ 880

I'm teaching a course on protein structure prediction using computational approaches. As part of the course I'm asking the students to submit a specific query to blastp - just out of curiosity, why NCBI doesn't simply cache the queries and display the saved output for queries that have already been made? (it can refresh the DB every once no and then..). Just feels like a total waste of computation power and research time..

blast • 1.7k views
ADD COMMENT
1
Entering edit mode

Looks like with an account you can save jobs:

http://www.ncbi.nlm.nih.gov/books/NBK153387/

http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=GetSaved&RECENT_RESULTS=on

Probably because it is more efficient to just run the search again than it is to store the results.

Besides, if I enter a raw sequence without an accession, how does BLAST find out if there's already a result for that? It would have to check a database of previously submitted sequences, then if there wasn't a stored result, it would have to actually run BLAST. What do you do when people change options, add entrez queries, search different databases? What about when a query is an exact substring of another query, do you pull results for the preexisting query?

I'm not saying that it is impossible to do this, just that dealing with it might not be more efficient in terms of energy/time/cost.

I'm not sure how this feature really costs any "research time" to the user. If the user needs to reference the result(s) from blast frequently, the user should be saving the results locally.

ADD REPLY
0
Entering edit mode

I agree it's not a terrible hurdle to scientific research, but still feels a bit weird they do not employ such option. Maybe the queries are too diverse that it's not worth the effort.

ADD REPLY
0
Entering edit mode

NCBI does save the blast search results but for a short time (look for an RID # on blast results page). This time used to be longer but when I checked now it is about 36 h. I remember reading in a recent presentation that NCBI does about ~60K blastp searches each day so the volume is large. AFAIK, NCBI blast databases are rebuilt each day so the result would be slightly different each day depending on your query.

That said NCBI probably has significant compute resources available (perhaps someone from NCBI on the forum can comment) and the searches we do on the web are probably done against blast indexes held in memory so there is not that much computational overhead. NCBI servers are probably up 24x7x365 anyway.

ADD REPLY
0
Entering edit mode

I'm not sure about 36h, we just run the same sequence on about 30 computers in a difference of minutes, and still it looks like every time the results are re-calculated..

ADD REPLY
1
Entering edit mode

@roy.granit: You can use the link posted by @joe.cornish826 and find the "request ID" of the search you ran (before class). You can then share that link (this is a dummy example: http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=5SU55W2X01R ) with your class so they can pull up your search without having to run it again themselves.

ADD REPLY
0
Entering edit mode

Thanks, sound like a great idea.

ADD REPLY
0
Entering edit mode

It is on a per-user basis.

ADD REPLY

Login before adding your answer.

Traffic: 2958 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6