Precaching nr database for multiple psiblast queries
0
0
Entering edit mode
20 months ago

I need to run a large number of queries with psiblast against the nr database. I am reproducing some results and need to use the same parameters as the original study, so I cannot speed up the computation using fewer iterations, etc, nor can I use a smaller db. I found this page, https://ncbi.github.io/blast-cloud/blastdb/pre-cache-blastdbs.html, which describes a method for caching a database in memory, however I'm not sure if it is current, and I don't understand how piping the files to /dev/null achieves this. Does anyone know if its possible to use psiblast with a "preloaded" database? And if so how do you go about it?

-- SOLUTION --

I was able to significantly speed up this process using a solution based on GenoMax's reply. First, create a temp dir in memory, mkdir -p -m 700 /dev/shm/$some_name. Then copy the db files to the dir, cp ./nr/* /dev/shm/$some_name. Now when running your query, point psiblast at /dev/shm/$some_name. This of course relies on having a machine with a large amount of RAM.

psiblast nr blast • 1.0k views
ADD COMMENT
0
Entering edit mode

I am reproducing some results and need to use the same parameters as the original study

I don't want to nitpick but this will not be fully reproducible since nr today is not the same as what the study used (assuming that was from some past time). There are no archival copies of nr database available and you probably can't (or don't want to try) create a copy that matches old one.

ADD REPLY
0
Entering edit mode

Yes you're right. I simplified to avoid the explanation. What I'm actually trying to do is precompute the psiblast results for a tool that uses the pssm as an intermediate input. So I'm using the the same parameters the tool specifies internally. The goal isn't actually to reproduce the results, but rather to use the same procedure as the tool does internally to avoid as much as possible any deviation from its expectations.

ADD REPLY
1
Entering edit mode

I have a hunch that the caching is being done to local disk (it being a AMI). You can simply use nr from the location you have. Unless you have a large amount of RAM caching the database in memory would likely not work. But if you do have the RAM then perhaps creating a RAM disk may help: https://linuxhint.com/create-ramdisk-linux/

ADD REPLY
0
Entering edit mode

Cool, I'll check this out. I have about 1.5tb of ram so maybe it will be enough

ADD REPLY
0
Entering edit mode

Oh wow, this was able to speed it up an enormous amount. Instead of using the ramdisk technique in that link I used /dev/shm. I guess its a similar process. Thanks for your help!

ADD REPLY
0
Entering edit mode

Excellent. /dev/shm is shared/virtual memory so similar.

ADD REPLY

Login before adding your answer.

Traffic: 3555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6