Question: Blast Help On Nucleotide Collection Nr/Nt
2
gravatar for Matt
8.0 years ago by
Matt40
United States
Matt40 wrote:

I am running a local blast server. I can format and blast my own databases. However, I am unsure of how to setup the "Nucleotide collection nr/nt" database from this NCBI Blast URL.

Can I just download a preformatted db and use the update script? Which database is it? Is it just both the nr and nt databases? Isn't blastn used for the nt database and blastp used for the nr database? Can I blast them both at the same time? If so how?

Also, downloading nr downloads two files nr.01.tar.gz and nr.00.tar.gz. Is this right? How can I setup to blast just "nr" rather than "nr.00 nr.01"?

I have been using the following commands:

blastp -word_size 7 -evalue 10 -query test.fasta -db "nr.00 nr.01"

and blastn -word_size 11 -evalue 10 -query test.fasta -db nt.00

Thank you for your help!

Matt

ncbi fasta alignment blast • 22k views
ADD COMMENTlink modified 8.0 years ago by Digiomics160 • written 8.0 years ago by Matt40

Thanks --- I was wondering too!

ADD REPLYlink written 12 months ago by kukumat0
7
gravatar for Neilfws
8.0 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

I think your confusion stems from the use of the term "Nucleotide collection nr/nt", on the BLAST page to which you linked.

In that case, "nr/nt" stands for "non-redundant nucleotide." However, as you point out, NCBI also make separate databases available for download. In this case, "nr" is non-redundant protein, "nt" is non-redundant nucleotide.

Yes: you would blastn versus nt and blastp versus nr. No: you cannot BLAST both "at the same time." You need to choose an appropriate combination of BLAST program and database. For example, you can BLAST nucleotide queries against the protein database by using blastx, which first translates the queries in 6 frames.

The 2 files nr.00 and nr.01 simply mean that the database has been split into two parts, because it is very large. Older BLAST versions used an additional index file - it used to be called "nr.pal" and may still be called that. Provided that 00, 01 and the index file all reside in the same location, local BLAST will "stitch" the 2 parts together in the background and you just specify "nr" as the database. Alternatively (since I have not upgraded to BLAST+ myself), it may be that the index file is no longer required.

ADD COMMENTlink written 8.0 years ago by Neilfws48k

Thanks for the help!

ADD REPLYlink written 8.0 years ago by Matt40

So I have the same issue except the nt databases are now in 27 parts. I downloaded all of them but cannot extract any of them because there is absolutely no space. I extracted the nt.00 file first and that had a nt.pal file. Is that all I need?

Am I required to download ALL the nt files because I don't see how this is possible given the space requirements.

ADD REPLYlink written 4.5 years ago by balasink10

I have the same issue. nt is now in 31 parts. How should I do?

ADD REPLYlink written 4.0 years ago by schwarcz.kaiser0

You'll need ~ 34.2 GB for the current nt database (once extracted from the .tar.gz files). If you don't have that, you can't run it locally.

ADD REPLYlink written 4.0 years ago by Neilfws48k
1
gravatar for Digiomics
8.0 years ago by
Digiomics160
Netherlands
Digiomics160 wrote:

Actually, the "nr" database has currently 6 parts, so it should be nr.00 to nr.05. If you have trouble using the update script, you can also download preformated blast databases from the NCBI ftp server

ADD COMMENTlink written 8.0 years ago by Digiomics160

I will do that. Thank you!

ADD REPLYlink written 8.0 years ago by Matt40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1974 users visited in the last hour