local blastx server's memory usage
2
0
Entering edit mode
8.3 years ago

I am hosting a local blastx (v 2.2.31+) server on a 20 core machine with 128GB of RAM running CentOS 6. I am running a relatively large series of jobs (e.g. 1 job may have ~10^6 samples with on average ~100 nucleotides per sample, but with some samples up to 3000 nucleotides). Unsurprisingly, these runs take on the order of days to complete. I am blasting against the "nr" database.

I submit my job to the machine via SGE and specify num_threads = 20. When I watch the memory usage of blastx, it seems that it is only ever using 4 - 10 GB of memory and it varies between the two limits. The "nr" database is ~70GB. I would think that blastx should be using on the order of 70GB, not 4-10GB.

This makes me think that it is swapping the database in and out of memory as it runs, which is time consuming and inefficient.

QUESTION : is there a way to force blastx to load the entire database (since I have RAM to do so)? Or am I misunderstanding the problem?

blast • 3.6k views
ADD COMMENT
0
Entering edit mode
8.3 years ago
h.mon 35k

Blast does not offer much control over memory usage, and it tries to use an optimal amount of memory. But maybe your system has the BATCH_SIZE environment variable set with a low value, or ulimit has a low value on memory usage.

There are faster alternatives to blastx, two that spring to mind are Rapsearch and DIAMOND.

ADD COMMENT
0
Entering edit mode

ulimit -a shows that max memory size is unlimited and BATCH_SIZE is not set. Thanks for the links, I will be sure to look at them.

ADD REPLY
0
Entering edit mode
8.3 years ago

The observed behaviour is normal, because BLAST does not expect to have so much memory available. Consequently, reads in part of the database. From my experience, BLAST is also quite bad at multi-threading by itself.

I recommend two things:

  1. Split your input data in some clever way and submit 20 jobs in parallel. This will usually give you quite a boost in runtime.
  2. Because you have a lot of RAM available on a single machine, create a ramdisk slightly larger than your database and copy(!!!) the database to that spot. This will significantly reduce the database I/O (Blast does a lot of that) and again speed up your process.
ADD COMMENT

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6