Blastn Runtime 606 Queries
3
1
Entering edit mode
12.6 years ago
W Langdon ▴ 30

When I run blastn -numthreads 6 -task blastn-short -db humangenomic on one 36 bp query it typically takes less than a minute and occupies up to about 5GB. When I gave it a file (using -query) of 606 queries all of the same length, blast took all weekend (8.5 mins each on average). Top suggests it was trying to use 60 gigabytes of memory. Also time says instead of running six times faster (-num_threads 6) it was only effectively using 27% of one CPU.

Is this expected? My plan is in future to split the query file into 6 and run then in series. Does that sound sensible?

(the server has 8 CPUs and 32 MBytes)

Many thanks Bill

ps: the font in this window is too small for me:-(

Update: I tried spliting another file queries into ten files each of either 56 or 57 queries each. Even the slowest of these averaged about three queries per minute. Whilst the fasted did more than 12 per minute.

blast memory • 3.0k views
ADD COMMENT
0
Entering edit mode

My computer does 50 sequences per second. I am not sure why the difference is so huge. Where did you get your blast from? Which version is it? Are your files on a network storage? My BLAST has differnent parameter names the yours. Did you formatdb?

ADD REPLY
0
Entering edit mode

Wow. So something is wrong!

blastn was downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.2.25+-x64-linux.tar.gz

I am now using files on a local disk but this does not seem to be any faster than using files on a network disk. In my case blastn is CPU limited (unless it runs out of RAM and starts paging).

I used http://www.ncbi.nlm.nih.gov/blast/docs/update_blastdb.pl to download human_genomic*tar.gz My reading of the documentation suggested that these files are "preformatted" and so I did not explicitly use formatdb

Thank you
Bill

ADD REPLY
0
Entering edit mode

So you didnt use formatdb? That my be a reason. Files on your local hard disc can me accessed much faster then network files.

ADD REPLY
1
Entering edit mode
12.6 years ago
Eric Fournier ★ 1.4k

You might be trying to run too many queries at once and hitting your physical RAM limit, thus making all of your processes have to use the swap file, dramatically hindering performance.

ADD COMMENT
0
Entering edit mode

Thanks Eric. I think that is consitent with top. I guess I was expecting blastn -task blastn-short to finish each of my queries before proceeding to the next. If its going to fill the server's RAM before printing anything, is there a benefit to batching up queries in a file? Would it be better to process them one at a time? Thanks again Bill

ADD REPLY
1
Entering edit mode
12.6 years ago
Pablo Pareja ★ 1.6k

Last time I had to use blast for a decent amount of metagenomics data, I came up to the conclusion that it was way better launching a lot of independent blast processes with small pieces of data than just a few with bigger queries. I don't know why but for some reason it seems that blast execution time is not proportional to the number of queries, not at least from the point where that number starts to get big enough.

ADD COMMENT
0
Entering edit mode

Thanks Pablo. Would your "number of queries" be consistent with what I am seeing? Ie somewhere between 60 and 600 queries in a sinle blast run, blastn slows down consdierably. When launching lots of blast processes will they share memory? Or does each of them need its own copy of the Human Genome in RAM? Thanks again Bill

ADD REPLY
0
Entering edit mode

Yeah, in my case I found that the best number of queries in terms of performance was somewhere around 100. I must say though that I was launching blast tasks as native processes from a Java program, not from any sort of script in a terminal or something like that. I'm not sure whether that actually changes things or not but I'd say it was faster that way. Regarding memory sharing, they were independent native processes each so they were not sharing anything in RAM. (just for the record I was using nt db)

ADD REPLY
1
Entering edit mode
12.6 years ago
ALchEmiXt ★ 1.9k

What version of blast+ did you use. There has been a "recently" fixed issue on multi_thread options. You might want to review the update history.

ADD COMMENT
0
Entering edit mode

blastn says it is Nucleotide-Nucleotide BLAST 2.2.25+ build Mar 21 2011 12:24:26

Today ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ChangeLog suggests that this is the latest version.

Thanks
Bill
suggests

ADD REPLY

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6