Im trying to turn my downloaded blast nr database into something I can use with DIAMOND. Apparently I need to convert it to a fasta file first. Im letting the conversion run right now and it is over 200 GB and still going. Im wondering if my computer will be able to hold the full file. So will the fasta file end up being the same size as the blast nr database, 700 GB? Or will it stay around 200-300 GB?
I tried that, but it said I didnt have the right file, even though I specified the correct path to the database folder. And then I read somewhere that the prepdb command needs a fasta file to work.
Have you downloaded the taxonomy database files? see --> https://github.com/bbuchfink/diamond/issues/859
The error I am getting says, Error: No BLAST protein database was found at the specified path. It doesn't specify taxonomy files, which makes me think it might just not recognize the database as a whole.
Do you have all the uncompressed
nr
files in the directory that you are using as input for-prepdb
?Yes. I got 124 .tar.gz files at first, and then unzipped all of them so I now have a bunch of pxm, phd, pin, pog, phi, ppd, and psq files, and I also have a few files called taxdb.
Are you providing full or relative path to that directory with
-d
with-prepdb
?Yes. all the database files are in a folder called "blast_databases" with the path "D:\blast_databases"
I am running the command "diamond prepdb -d D:\blast_databases" from the folder containing diamond.exe
Hmm. You are trying to do this on windows. Are you still on your local machine or is this a VM with more RAM etc. Are you using the latest version of DIAMOND?
If I was to guess, you are likely running into the 16GB RAM not enough issue on the local PC. DIAMOND also requires tens of GB of free RAM. DIAMOND is likely not able to read the database files and that error you see may be misleading.
I'm trying on a local machine, but if I tried on a vm, would that fix the issue I'm having with diamond not being able to read the database files?
As long as there are enough resources, I would say yes. If you are able, switch to linux when you go to the VM. It would make life easier in future, especially when you need to parse the output of the alignments etc.