Question: local blastp against NR database
gravatar for Chirag Parsania
3.5 years ago by
Chirag Parsania1.9k
University of Macau
Chirag Parsania1.9k wrote:


I am running blastp programme locally against NR database. I have 272 threads in total (68 * 4). To use all computational resources I split my query fasta file in the 270 files and running blastp simultaneously on all 270 files against NR database. In blast command for --num_threads i am using 1. My questions are

1) Can I get better performance with the approach I have mentioned above compare to use more number of threads in --num_threads blast argument ?

2) It's been almost 24 hours but none of the command of 270 has generated any output yet. Is this common that blast against NR takes too long to give first output ?

Thanks in advance.

time blastp nr • 2.4k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Chirag Parsania1.9k

Hi Chirag,

Thanks for asking that question; kind of similar doubt I am in. There is a nice post on GNU parallel which is a better approach and I am yet to implement it. You can give it a try.

However, it is unusual that there is no output in 24 hours. Though that depends on your query size but still it has been quite long.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by lakhujanivijay5.3k

Hi Vijay,

Thanks for providing useful link. It was quite surprising that I didn't get any output after 24 hrs. Unfortunately it was true. I tried to figure out the reason and this is what I understood.

I ran 270 independent blastp commands each with 1 thread (--num_threads 1). Here, the problem was database size. I am using NR database which has size more than 70G. I realise that blastp uses multi threads only to populate the database. So when I run blast against NR with 1 thread It can not populate database very easily (not in 24 hrs. with 1.5 ghz Fq.). To confirm this I ran one blastp instance with 50 threads (--num_threads 50) it started giving me output in 20 min which conforms that number of threads really matters to populate blast database.

Surprisingly, though I assigned 50 threads I saw my computer using only one thread once it started to generate some output (after 20 min). Suggesting that to search hits, blastp uses just ONE SINGLE THREAD regardless of what user provides to --num_threads argument. And so if one dedicatedly assign the number of threads to run blast resources are not getting fully utilised once database populated.

So what I found ideal in this scenario is populate blast database with maximum number of threads and identifying hits with single thread. To achieve this one has to carefully monitor computational resources utilised by blastp and according to that one has to submit job one by one

ADD REPLYlink written 3.5 years ago by Chirag Parsania1.9k

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

That said running 207 jobs with a single thread against nr is extremely inefficient (as you discovered) since each of those jobs is going to need to load the databases indexes in memory. If this is a cluster I would stay away from GNU parallel and just use the job scheduler to manage multiple jobs. If you have the file split into 270 pieces then go ahead and start as many jobs with as many threads per job (making sure that the threads for one job remain confined to one physical server to reduce backend data chatter).

If you have ~100G of RAM available then using DIAMOND would be a speedy option. You would need to create a new index for nr for use with DIAMOND. For all this to work efficiently you need to have a robust (high-performance) storage available otherwise reading/writing things will become a bottleneck.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by GenoMax94k

Hi genomax,

Can you please direct me to appropriate resources for installing and implementing job scheduler? I would like to learn more about it.

ADD REPLYlink written 3.5 years ago by lakhujanivijay5.3k

If you are doing most of your computing on stand-alone servers then GNU parallel is perfect solution as you had suggested. If you are using a compute cluster then the cluster OS generally will come with a scheduler. Examples can be SGE, Slurm, PBS etc. Very large clusters generally use commercial schedulers (e.g. LSF) which require purchasing a license.

ADD REPLYlink written 3.5 years ago by GenoMax94k

Hi @genomax,

Does it affect a lot in terms of timing to get first output of blastp if we increase -max_target_seqs 1000 ? Because I am running blastp with 10 threads and it's more than 12 hrs, no output yet.

Second, see the output of top command here. Though blastp is running cpu usage is very less and job status is S rather I would expect R


if you see output of top command seems like there is none of the processor are working. I am using Intel Phi 7250, 68 cores and 4 threads per each. I notice something, previously we have to compile the program to support AVX-512, which is the CPU instruction set that suits well for multiple core CPU like Intel Phi, however, the blastp rpm is pre-compiled version and I cannot tell whether it provide support for AVX-512, so that it might degrade the performance. Is this the case ??

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Chirag Parsania1.9k

If you look at the load average value it is at 270+ so if it is not your blast processes that is making up for most of that load then something else is (it may just be I/O on the Phi interface). Are you not the only person running jobs on this card/node/server? If not, your jobs appear to have lower priority and hence may be sleeping.

Intel Phi is a very special hardware application. I doubt blast rpm is compiled to take advantage of these special features. Intel Phi like other GPU type applications is still constrained for large I/O. If you are technically inclined take a look at the article here. Intel also has a page that describes how blast needs to be setup to run on Phi. It does not appear to be worth using Phi especially if you have access to a regular cluster.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by GenoMax94k

Hi genomax,

I am the only one using this server. So that is more surprising to me. Though load balance is there cpu usage is very less. One thing I have noticed is as soon as I run program it shows full cpu usage for some time(Till it throws first output). After that cpu usage goes down and never update output file.

The article you mentioned about setting up blast on phi is talking about phi coprocessor which uses GPU. The one which I am using is phi processor which uses CPU and not GPU. I believe coprocessor is older than processor.


ADD REPLYlink written 3.5 years ago by Chirag Parsania1.9k

Do you have any information about the kind/type of server hardware you are using? Since the internal architecture of Phi should the identical (as a processor or co-proc) you would need to recompile blast to take advantage of that (I don't have direct experience, so this is logical speculation).

ADD REPLYlink written 3.5 years ago by GenoMax94k

How many CPUs do you have in your computer? I changed the -num_threads to 25, and the blastp program reduced it to 24 automatically, since there are only 24 CPUs in my computer. It took 8 minutes to finish a job for blasting one query sequence (Length=242) against the latest nr database (106G). Amazingly long time.

ADD REPLYlink written 2.1 years ago by huanyuyi0
gravatar for Sej Modha
3.5 years ago by
Sej Modha4.7k
Glasgow, UK
Sej Modha4.7k wrote:

I'd split the query files down to smaller chunks and run blastp as it'd be quicker to run and also look at DIAMOND (blastp) instead of running blast as it's much faster compared to blast.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Sej Modha4.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 980 users visited in the last hour