Entering edit mode
10 weeks ago
katieostrouchov ▴ 20
I am attempting to perform a local blastp against the pre-indexed version of the entire protein nr database using an Ubuntu instance. The problem I am having is that the temporary file location is not going onto a storage drive that I have, and I run out of memory very quickly even though input and output directories are on large storage drives.
How do we change the buffer/temp folder for the blastp command and where might this be located on the system?
Here is the example code I used: Downloading pre-indexed version of nr:
sudo perl update_blastdb.pl --passive --decompress nr
Searching nr with query protein fasta file:
blastp -query /mnt/myquery.fasta -db /mnt/blastdb/nr -outfmt 6 -out ./myresults.out
Where is the blastp command:
kostrouchov@myip:~$ which blastp /usr/bin/blastp
It is unclear if you are having a problem while downloading the indexes or in actual run.
nris over 100G of files.
Yes, the download of pre-indexed nr was successful to a mounted storage location. Running blastp to search the nr database overloads the /dev/root/ storage (of 100GB) which must be the temporary file location for the command. My instance contains 512GB RAM.
First of all, memory and storage are different things. If you run out of storage, you have to ask the sysadmin to allocate more for your needs. If that's a shared machine you are working on, it could be that your coworkers are hammering onto the same network drive as you and it is just painfully slow or full.
What you can always do is, split the queryfile into smaller ones and run them separated, that could reduce your memory usage. Furthermore, if you run out of memory, the kernel will utilize the swap-storage. Check if your system has any swap-file:
swapon -s. If thats not enough or not allocated at all, allocate desired size! There are entire guides out there on how much swap you should allocate. :
sudo dd if=/dev/zero of=/swapfile bs=1M count=32768(32768 blocs of 1MiB = 32GiB)
Another way is, to adjust the BATCH_SIZE during the blastrun. But I have never done this before. Here you can check out how to optimize memory-usage during a blast run. blast memory usage
Finally on a unix machine the default temporary directory is
/tmp. But if blast runs out of memory during the run, the tmp-files wont be in your temporary directory but rather in the swap space.
Anyway: Try to check how much space is occupied in /tmp:
du -hc --max-depth=0 /tmp. Check how much space is left on your disk:
df -h /tmp/
And what always could help is running htop. This tool shows memory-, system-, swap-usage and more in real time. So run your blast query in tmux Then run
htopto see what is causing the error.
A work around I found is to download the full nr.gz fasta from ncbi and use diamond blastp as they have documentation on changing the temporary file location "--tmpdir" and batch size "--index-chunks --block-size":