Progress bar to measure blast running time?
Entering edit mode
6.6 years ago
Expe ▴ 10


I wonder if it is possible to use a progress bar (or percentage) to measure the time it takes blast to run in my python script. If so, how to do it? I would like it to appear in the terminal so that the users of the script know that it is going to take time and they don't desperate.

What I could find involves loops, and in my case I don't have one (it is a single line):

blast = NcbipsiblastCommandline(query=sys.argv[1], db="my_db", evalue=0.0001, outfmt=5, out=name, num_alignments=100)

Thank you!

blast biopython progressbar • 2.5k views
Entering edit mode

I am not sure this is even feasible! It strongly depends on how many hits, the length of the HSPs, the dimension of the database, stuff like that... I don't think that you can "estimate" this. We're not capable to compute this in one line of code with the current soft- and hard-ware. :D

How about counters and time markers? For example:

  • checked <n> sequences
  • mm:ss from the start of the search
  • a random biology question to answer every 5 minutes so that they might as well google it in the next five mintues wait
  • link to a random publication

just suggesting!

Entering edit mode

I don't think blast has an option to display "time remaining/progress". You could add a time counter to keep the users entertained in your script :)

Entering edit mode
6.6 years ago
Daniel ★ 3.9k

I too am impatient and want to monitor how my runs are going. While what people are saying above is correct that there's not a built in method, what I normally do is use watch with either grep -c or wc -l depending on your output format.

But if you want to do it in your script, either read in the file and ++ a count of "^Query=" (for standard blast format) or if you were really lazy and also didn't want to read the whole file in you could do a system call to grep the number of something like:

from subprocess import call
progressCounter = call(["grep", "-c",  "^Query=", outfile])

If you're using outfmt=8 then you could just count number of lines, or in XML there's an equivalent.

Then either output:

print "Blasting " + str(len(query)) + "sequences. Have completed " + str(progressCounter)

or do a % calc like

print "Blast is " + str((progressCounter / len(query) )*100) + "% complete"

Just some ideas.

Entering edit mode

Very nice idea. By the way, does the XML/ASN.1 format report query identifiers that don't have any matches in database? I know that outfmt=6/7 exclude these queries from the output.

Entering edit mode
6.6 years ago ★ 2.7k

You could add a print line before you run blast, and verify that it completed:

#!/usr/bin/env python
import os
print("Running BLAST. Please do not interrupt..")

blast = NcbipsiblastCommandline(query=sys.argv[1], db="my_db", evalue=0.0001, outfmt=5, out=name, num_alignments=100)
stdout, stderr = blastx_cline()

if os.stat('youroutputblast').st_size > 0:
         print ("BLAST complete.")
         print("BLAST failed. The output is empty")

At very least, a glance at the terminal will tell one that the BLAST is still running.


Login before adding your answer.

Traffic: 2427 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6