I need to run a large number of queries with psiblast against the nr database. I am reproducing some results and need to use the same parameters as the original study, so I cannot speed up the computation using fewer iterations, etc, nor can I use a smaller db. I found this page, https://ncbi.github.io/blast-cloud/blastdb/pre-cache-blastdbs.html, which describes a method for caching a database in memory, however I'm not sure if it is current, and I don't understand how piping the files to /dev/null achieves this. Does anyone know if its possible to use psiblast with a "preloaded" database? And if so how do you go about it?
-- SOLUTION --
I was able to significantly speed up this process using a solution based on GenoMax's reply. First, create a temp dir in memory,
mkdir -p -m 700 /dev/shm/$some_name. Then copy the db files to the dir,
cp ./nr/* /dev/shm/$some_name. Now when running your query, point psiblast at /dev/shm/$some_name. This of course relies on having a machine with a large amount of RAM.
I don't want to nitpick but this will not be fully reproducible since
nrtoday is not the same as what the study used (assuming that was from some past time). There are no archival copies of
nrdatabase available and you probably can't (or don't want to try) create a copy that matches old one.
Yes you're right. I simplified to avoid the explanation. What I'm actually trying to do is precompute the psiblast results for a tool that uses the pssm as an intermediate input. So I'm using the the same parameters the tool specifies internally. The goal isn't actually to reproduce the results, but rather to use the same procedure as the tool does internally to avoid as much as possible any deviation from its expectations.
I have a hunch that the caching is being done to local disk (it being a AMI). You can simply use nr from the location you have. Unless you have a large amount of RAM caching the database in memory would likely not work. But if you do have the RAM then perhaps creating a RAM disk may help: https://linuxhint.com/create-ramdisk-linux/
Cool, I'll check this out. I have about 1.5tb of ram so maybe it will be enough
Oh wow, this was able to speed it up an enormous amount. Instead of using the ramdisk technique in that link I used /dev/shm. I guess its a similar process. Thanks for your help!
Excellent. /dev/shm is shared/virtual memory so similar.