I'm new to Biopython, and programming in general, but I am trying to create a small script that will query a large number of RNA sequences for BLAST query. Right now, I'm using Biopython and qblast, but I'm finding that during certain times of day, it takes 7-8 minutes for a single query. Is there a better way to accomplish this, other than running BLAST locally? I've been told that we would like to avoid that as much as possible.
My code currently looks something like this:
for sequence in sequences:
while True:
try:
resultHandle = NCBIWWW.qblast("blastn","nr", sequence)
if serverWasDown:
print "Server is up and running again."
break
except:
print "Server connection lost, waiting 10 seconds to try agiain. Please make sure the computer has a working network connection."
serverWasDown = True
time.sleep(10)
I've been thinking that there must be a way to submit a batch query over NCBIWWW but I just can't find out exactly how to do that. I don't suppose you or anyone else has any ideas?
The NCBIWWW docs at biopython.org show a qblast function with a query_file parameter, but it runs as an HTTP GET with no provision to upload the file. Looks broken, unless the urlencoder has some hidden voodoo. So try using the query sequence like a fasta file: try querying
like a multifasta file. Beware packing the sequences into a GET request might cap out at under a kilobyte.
While unfortunately, I haven't been able to get that parameter to work (broken, as you say) I am instead setting up the nt and nr database locally as you suggested. I really appreciate the help!