Show Progress (Bar?) For Blast+ Query
5
3
Entering edit mode
10.7 years ago

Hello everyone,

It is possible to visualize the progress of a local blast+ (verbose mode or something [maybe python ou perl])?

Thanks in advance!

blast blast command-line • 7.3k views
ADD COMMENT
0
Entering edit mode

Sorry for the delay, I liked RM's approach but there is a problem: the blast output file does not update itself until the program stop working, therefore it's not possible to retrieve progress information in real time.

Any suggestions?

ADD REPLY
0
Entering edit mode

instead of -o outputfile ; use > outputfile ; this might get updated in real time;

ADD REPLY
0
Entering edit mode

That did the trick :)

ADD REPLY
0
Entering edit mode

Update: I tried the script with an html output, but it didn't worked. Changing # grep -c "^Query= # to # grep -c "Query= # corrects that. I tried other outfmt options, but no luck :(

The script will work on default blast+ output (outfmt=0) and html format.

Also, I changed the end of script a bit too:

totalcount=$(grep -c "^>" input.fasta); completed=$(grep -c "^Query=" blast_out.txt) ; percent=100*$completed/$totalcount ; total=$(echo $percent | bc); echo From $totalcount sequences, $completed were processed '('$total%')' :')'

Thanks for the suport!

ADD REPLY
8
Entering edit mode
10.7 years ago

I don't think so, but if I'm wrong, I would be happy because I miss it too. What I'm doing now is (works with a tab delimited output and a multifasta file query) to find the last query ID written in the blast output, and then to find it into the fasta file. This works only because sequences are "blasted" in the same order as in the query file.

## 1st step
tail -1 out.tab.blast | cut -f 1  > lastqueryblasted

## 2nd step
grep -n -f  lastqueryblasted query.fasta
# res1, ex: 23000

## last step, pourcentage computing
wc -l  query.fasta
# res2, ex: 33000

# your (under)estimated pourcentage
echo "(23000 / 33000) *100" | bc -l

This is not very convenient I know, but it helps.

EDIT: I use that so much often, that I eventually wrapped it in a script.

If you name it blast_monitor.sh, it works like this:

blastab_monitor.sh blastresult.blast queryfasta.fa

Here it is:

#!/bin/sh
### monitor a tab-outputed blast job by giving the approximative % done
blast=$1
query=$2
echo "the blast out is: "$blast
echo "the fasta query is: "$query
echo
curquery=$(tail -1 $blast | cut -f 1)
curline=$(fgrep -n $curquery $query |  cut -f 1 -d ':')
nblines=$(wc -l $query | cut -f 1 -d " ")
percent=$(echo "($curline/$nblines) *100" | bc -l | cut -c 1-4)
echo "The blast job is about $percent % done..."

I hope that helps.

ADD COMMENT
0
Entering edit mode

Thank you so much. That worked perfectly for me for both the below cases.

  1. blastx:: Input: query.fasta, Output: -outfmt 6 > uniref90.blastx.outfmt6
  2. blastp:: Input: transdecoder.pep. Output: -outfmt 6 > uniref90.blastp.outfmt6

Thanks again,
Ravi

ADD REPLY
0
Entering edit mode

Hello, is there any way this could be rewritten for outfmt = 7 tabular format with comments?

ADD REPLY
4
Entering edit mode
10.7 years ago
Rm 8.2k

input fasta file: input.fasta

regular blast output: blast_out.txt

totalcount=$(grep -c "^>" input.fasta); completed=$(grep -c "^Query=" blast_out.txt) ; percent=100*$completed/$totalcount ; echo $percent | bc -l
ADD COMMENT
0
Entering edit mode

might be good to add what kind of output format you require to have a correct Query grep. Default output as well as html or default tabular without interspersed query info will fail the calculation likely...

ADD REPLY
0
Entering edit mode

default blast output: might work for html (i haven't tested though)

ADD REPLY
2
Entering edit mode
8.1 years ago
alephreish ▴ 60

Here is a cross-platform solution in bash. The script indicates the percentage of rows in the input file consumed by blast in real time, which is a very good proxy of the process progress when the query is a large set of sequences. It's based on the idea that the input can be piped to blast, and the piping command can output the current status to a different stream.

Use as: blastprogress <your normal blast command with full set of options>

E.g.: blastprogress blastn -query my.fasta -num_threads 3 -outfmt 6 -evalue 1e-5 -db nr1 -out my.blast

blastprogress:

#!/bin/bash

die() {
  echo $1 >&2
  exit 1
}

## test whether the command is correct ##
[[ "$1" =~ blast* ]] || die "Not a blast program '$1'"
command -v "$1" >/dev/null 2>&1 || die "'$1' not found"

## grasp the -query argument to replace it with owr pipe ##
for ((j=$#;j>0;j--)); do
  if [ "${!j}" == '-query' ]; then
    i=$((j-1)); k=$((j+1)); l=$((j+2))
    query=${!k}
    set -- "${@:1:i}" "${@:l}"
    break
  fi
done

## validate the query ##
[ -f "$query" ] || die 'Input file not found'
lines=$(wc -l < "$query")
((lines>0)) || die 'Input file is empty'

## we need these two strings to plot the progress bar ##
bar='===================================================================================================='
blk='                                                                                                    '

echo "Lines consumed:" >&2
printf '[%.*s] %d %%\r' 100 "$blk" 0 >&2

## ival is the number of rows corresponding to 1% ##
ival=$((lines/100))
((ival==0)) && ival=1

## we use awk to monitor the number of lines consumed by blast ##
awk='{ print }
    NR%'$ival'==0 {
      p=sprintf("%.f", NR*100/'$lines');
      system("printf '"'[%.*s%.*s] %d %%\r'"' "p" '"'$bar'"' "(100-p)" '"'$blk'"' "p" >&2");
  }'

## run blast ##
eval "$@" -query <(awk "$awk" "$query")

echo >&2
echo 'Done' >&2
ADD COMMENT
0
Entering edit mode

I run into trouble with the above if I try to set the output columns in the outfmt 6 file "Too many positional arguments, the offending value sgi" for instance.

ADD REPLY
1
Entering edit mode
10.7 years ago
ALchEmiXt ★ 1.9k

On unix systems you could use the "watch" command to cat for instance the tail of your output file to screen. You can make it more fancy by including a grep to just show the query ID and calculate the % like RM just suggested.

A simple solution when you only have one hit returned per query in tabular form is that you can grep the line count.

watch tail -n 10 blast_out
ADD COMMENT
1
Entering edit mode

tail -f blast_out works too.

ADD REPLY
1
Entering edit mode
6.1 years ago
holgerbrandl ▴ 30

Here's a kotlin solution using kscript. The logic to estimate the progress is similar to the other solutions but it's shorter and more portable (since it java-based).

blast_progress(){
kscript - $* <<"EOF"
//DEPS de.mpicbg.scicomp:kutils:0.3
//KOTLIN_OPTS -J-Xmx5g

import de.mpicbg.scicomp.bioinfo.openFasta
import java.io.File
import kotlin.system.exitProcess

if(args.size == 0 ){
    System.err.println("Usage: blast_progres <fasta> <blastresults>")
    exitProcess(-1)
}

val fastaFile= File(args[0])
val blastResults= File(args[1])

val fastaIds = openFasta(fastaFile).map { it.id }
val procIds = blastResults.useLines { it.map{ it.split("\t")[0]}.distinct().toList()}
val pcDone = procIds.size.toDouble()/fastaIds.size

println("Approximately ${pcDone} % of ${fastaIds.size} sequences were processed by blast.")
EOF
}
ADD COMMENT

Login before adding your answer.

Traffic: 1223 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6