Progress bar to measure blast running time?
3
0
Entering edit mode
4.7 years ago
Expe ▴ 10

Hello,

I wonder if it is possible to use a progress bar (or percentage) to measure the time it takes blast to run in my python script. If so, how to do it? I would like it to appear in the terminal so that the users of the script know that it is going to take time and they don't desperate.

What I could find involves loops, and in my case I don't have one (it is a single line):

blast = NcbipsiblastCommandline(query=sys.argv[1], db="my_db", evalue=0.0001, outfmt=5, out=name, num_alignments=100)


Thank you!

blast biopython progressbar • 1.8k views
3
Entering edit mode

I am not sure this is even feasible! It strongly depends on how many hits, the length of the HSPs, the dimension of the database, stuff like that... I don't think that you can "estimate" this. We're not capable to compute this in one line of code with the current soft- and hard-ware. :D

How about counters and time markers? For example:

• checked <n> sequences
• mm:ss from the start of the search
• a random biology question to answer every 5 minutes so that they might as well google it in the next five mintues wait
• link to a random publication

just suggesting!

1
Entering edit mode

I don't think blast has an option to display "time remaining/progress". You could add a time counter to keep the users entertained in your script :)

2
Entering edit mode
4.7 years ago
Daniel ★ 3.8k

I too am impatient and want to monitor how my runs are going. While what people are saying above is correct that there's not a built in method, what I normally do is use watch with either grep -c or wc -l depending on your output format.

But if you want to do it in your script, either read in the file and ++ a count of "^Query=" (for standard blast format) or if you were really lazy and also didn't want to read the whole file in you could do a system call to grep the number of something like:

from subprocess import call
progressCounter = call(["grep", "-c",  "^Query=", outfile])


If you're using outfmt=8 then you could just count number of lines, or in XML there's an equivalent.

Then either output:

print "Blasting " + str(len(query)) + "sequences. Have completed " + str(progressCounter)


or do a % calc like

print "Blast is " + str((progressCounter / len(query) )*100) + "% complete"


Just some ideas.

0
Entering edit mode

Very nice idea. By the way, does the XML/ASN.1 format report query identifiers that don't have any matches in database? I know that outfmt=6/7 exclude these queries from the output.

0
Entering edit mode
4.7 years ago
st.ph.n ★ 2.6k

You could add a print line before you run blast, and verify that it completed:

#!/usr/bin/env python
import os
print("Running BLAST. Please do not interrupt..")

blast = NcbipsiblastCommandline(query=sys.argv[1], db="my_db", evalue=0.0001, outfmt=5, out=name, num_alignments=100)
stdout, stderr = blastx_cline()

if os.stat('youroutputblast').st_size > 0:
print ("BLAST complete.")
else:
print("BLAST failed. The output is empty")


At very least, a glance at the terminal will tell one that the BLAST is still running.