I have downloaded the nr database from ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz and have extracted the files using 7-zip. When I do this I only get one file "nr".
Yet, when I download only part of the database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr.00.tar.gz) I can extract the .gz file, then the .tar file, and I get a whole bunch of files (e.g., nr.00.phd, nr.00.phi, etc.) and have been able to successfully BLAST against this database.
I am not sure what to do with the single "nr" file when downloading the entire database? Running a blast search with a command such as "blastp -infile.fasta -db db/nr -out outfile.txt -num_alignments 1" does not work, but "blastp -infile.fasta -db db/nr.00 -out outfile.txt -num_alignments 1" will work perfectly. With the full nr database I get an error saying "No Alias or Index File found for protein in dtabase [db/nr]".
Thank you.
It may take a while to build the indexes for the
nr
databases yourself. Just get the pre-made indexes fromftp://ftp.ncbi.nih.gov/blast/db/
. You want to get allnr*.tar.gz
files and then unarchive them in a folder. Running the search will only need the basename of the database which would benr
.nr means non-redundant
It is the whole database
The comment after this one was helpful (it appears to have been removed) so I am going to copy it here:
"This 'nr' file is in fasta format, right? You probably just need to run 'makeblastdb' first. Something like this should work: 'makeblastdb -dbtype prot -in nr'."
See this post, for example:
A: blastn execution error, the correct command line format
or this one:
nr- protein database
and scan biostars.org for makeblastdb, there are many of such posts about this command.
To do it press LATEST - button in the upper left corner
and type 'makeblastdb' in the empty line in the middle. 'Live search: start typing...'