I'm trying to search a large batch of sequences against the nr database with locally installed blast, running blastx with -task blastx-fast. I've split the file into batches of a few thousand sequences to run them in parallel, it's going to take weeks at this rate. Might the search proceed faster if the nr database was kept on an SSD drive or stick rather than on an ordinary hard drive?
Perhaps. But having plenty of RAM (~40G) and DIAMOND may be something you should look at. You will need to create DIAMOND blast indexes for nr. A normal
stick
(if you mean a plain USB drive) is not going to cut it at all.Can you alter your workflow any so that you don’t have to brute force it? Maybe you can cluster the sequences first or use HMMs to reduce your dataset size?