Current size of blast nr database as FASTA file
1
0
Entering edit mode
8 weeks ago

Im trying to turn my downloaded blast nr database into something I can use with DIAMOND. Apparently I need to convert it to a fasta file first. Im letting the conversion run right now and it is over 200 GB and still going. Im wondering if my computer will be able to hold the full file. So will the fasta file end up being the same size as the blast nr database, 700 GB? Or will it stay around 200-300 GB?

fasta blast • 943 views
ADD COMMENT
0
Entering edit mode
8 weeks ago
GenoMax 154k

Im trying to turn my downloaded blast nr database into something I can use with DIAMOND. Apparently I need to convert it to a fasta file first.

No you do not need to convert the pre-formatted blast+ databases to fasta. You will use DIAMOND -prepdb command to prepare the preformatted blast+ database files for use with DIAMOND.

Prepare BLAST database for use with Diamond. This call requires the path to the BLAST database (option -d) and will write a number of small auxiliary files into the database directory.

Current uncompressed nr fasta file will be hundreds of GB.

ADD COMMENT
0
Entering edit mode

I tried that, but it said I didnt have the right file, even though I specified the correct path to the database folder. And then I read somewhere that the prepdb command needs a fasta file to work.

ADD REPLY
0
Entering edit mode

Have you downloaded the taxonomy database files? see --> https://github.com/bbuchfink/diamond/issues/859

ADD REPLY
0
Entering edit mode

The error I am getting says, Error: No BLAST protein database was found at the specified path. It doesn't specify taxonomy files, which makes me think it might just not recognize the database as a whole.

ADD REPLY
0
Entering edit mode

Do you have all the uncompressed nr files in the directory that you are using as input for -prepdb?

ADD REPLY
0
Entering edit mode

Yes. I got 124 .tar.gz files at first, and then unzipped all of them so I now have a bunch of pxm, phd, pin, pog, phi, ppd, and psq files, and I also have a few files called taxdb.

ADD REPLY
0
Entering edit mode

Are you providing full or relative path to that directory with -d with -prepdb?

ADD REPLY
0
Entering edit mode

Yes. all the database files are in a folder called "blast_databases" with the path "D:\blast_databases"

I am running the command "diamond prepdb -d D:\blast_databases" from the folder containing diamond.exe

ADD REPLY
0
Entering edit mode

Hmm. You are trying to do this on windows. Are you still on your local machine or is this a VM with more RAM etc. Are you using the latest version of DIAMOND?

If I was to guess, you are likely running into the 16GB RAM not enough issue on the local PC. DIAMOND also requires tens of GB of free RAM. DIAMOND is likely not able to read the database files and that error you see may be misleading.

ADD REPLY
0
Entering edit mode

I'm trying on a local machine, but if I tried on a vm, would that fix the issue I'm having with diamond not being able to read the database files?

ADD REPLY
0
Entering edit mode

As long as there are enough resources, I would say yes. If you are able, switch to linux when you go to the VM. It would make life easier in future, especially when you need to parse the output of the alignments etc.

ADD REPLY

Login before adding your answer.

Traffic: 3360 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6