Blastn high mem usage, very low CPU usage
0
0
Entering edit mode
5.9 years ago
hdy ▴ 160

I am doing a blastn search on a local machine (4 cores, 8 threads, 8GB mem). My database is NCBI bacteria, archaea, viral refseq, which is about 200GB. What I noticed now is that the mem usage is very high with 7711M used, ~14M unused, the VM usage is 395G. However, my CPU is in very low usage: 1.6% usr, 2.85% sys, ~95% idle. Is this normal?

The blastn command I used is

blastn -db "bacteria_genomic_74 viral_genomic_74 archaea_genomic_74" \
-query input_file \
-out output_file \
-outfmt 6 \
-max_target_seqs 5


The entry of blastn in the top command says: CPU ~9% #TH 16 MEM 2304K

Sorry cannot get a snapshot of the top command since the computer is running and not responding very quickly. So I do not want to mess with it.

blast alignment • 2.8k views
0
Entering edit mode

Is bacteria_genomic_74 viral_genomic_74 archaea_genomic_74 the exact filename prefix of a single blast database or do you have three separate blast databases?

0
Entering edit mode

three separate ones, bacteria is the largest ~200G, other two is pretty small

0
Entering edit mode

If you don't have enough memory, most CPU will be used to handle VM - that's why you have low CPU usage. You need to use less memory. You could split in different databases and blast runs; or you could run cd-hit to reduce the size of your database... And there is probably some blast version (or analogue software) that do not load the whole database into memory - I would look for that

0
Entering edit mode

So if the database is very large, for example, whole eukaryotic genomes, what people do is to split databases into small pieces and blast the query multiple times and in the end, merge all the results. Is this right?

0
Entering edit mode

Go here

## Controlling concatenation of queries

As described above, BLAST+ works more efficiently if it scans the database once for multiple queries. This feature is knows as concatenation. Unfortunately, for some searches the concatenation values are not optimal, too many queries are searched at once, and the process can consume too much memory. For applications besides BLASTN (which uses an adaptive approach), it is possible to control these values by setting the BATCH_SIZE environment variable. Setting the value too low will degrade performance dramatically, so this environment variable should be used with caution.

## Memory usage

The BLAST search programs can exhaust all memory on a machine if the input is too large or if there are too many hits to the BLAST database. If this is the case, please see your operating system documentation to limit the memory used by a program (e.g.: ulimit on Unix-like platforms). Setting the BATCH_SIZE environment variable as described above may help.