Progress bar to measure blast running time?
3
0
Entering edit mode
4.7 years ago
Expe ▴ 10

Hello,

I wonder if it is possible to use a progress bar (or percentage) to measure the time it takes blast to run in my python script. If so, how to do it? I would like it to appear in the terminal so that the users of the script know that it is going to take time and they don't desperate.

What I could find involves loops, and in my case I don't have one (it is a single line):

blast = NcbipsiblastCommandline(query=sys.argv[1], db="my_db", evalue=0.0001, outfmt=5, out=name, num_alignments=100)

Thank you!

blast biopython progressbar • 1.8k views
ADD COMMENT
3
Entering edit mode

I am not sure this is even feasible! It strongly depends on how many hits, the length of the HSPs, the dimension of the database, stuff like that... I don't think that you can "estimate" this. We're not capable to compute this in one line of code with the current soft- and hard-ware. :D

How about counters and time markers? For example:

  • checked <n> sequences
  • mm:ss from the start of the search
  • a random biology question to answer every 5 minutes so that they might as well google it in the next five mintues wait
  • link to a random publication

just suggesting!

ADD REPLY
1
Entering edit mode

I don't think blast has an option to display "time remaining/progress". You could add a time counter to keep the users entertained in your script :)

ADD REPLY
2
Entering edit mode
4.7 years ago
Daniel ★ 3.8k

I too am impatient and want to monitor how my runs are going. While what people are saying above is correct that there's not a built in method, what I normally do is use watch with either grep -c or wc -l depending on your output format.

But if you want to do it in your script, either read in the file and ++ a count of "^Query=" (for standard blast format) or if you were really lazy and also didn't want to read the whole file in you could do a system call to grep the number of something like:

from subprocess import call
progressCounter = call(["grep", "-c",  "^Query=", outfile])

If you're using outfmt=8 then you could just count number of lines, or in XML there's an equivalent.

Then either output:

print "Blasting " + str(len(query)) + "sequences. Have completed " + str(progressCounter)

or do a % calc like

print "Blast is " + str((progressCounter / len(query) )*100) + "% complete"

Just some ideas.

ADD COMMENT
0
Entering edit mode

Very nice idea. By the way, does the XML/ASN.1 format report query identifiers that don't have any matches in database? I know that outfmt=6/7 exclude these queries from the output.

ADD REPLY
0
Entering edit mode
4.7 years ago
st.ph.n ★ 2.6k

You could add a print line before you run blast, and verify that it completed:

#!/usr/bin/env python
import os
print("Running BLAST. Please do not interrupt..")

blast = NcbipsiblastCommandline(query=sys.argv[1], db="my_db", evalue=0.0001, outfmt=5, out=name, num_alignments=100)
stdout, stderr = blastx_cline()

if os.stat('youroutputblast').st_size > 0:
         print ("BLAST complete.")
else:
         print("BLAST failed. The output is empty")

At very least, a glance at the terminal will tell one that the BLAST is still running.

ADD COMMENT

Login before adding your answer.

Traffic: 2896 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6