Question: How to download the complete database Nucleotide collection (nr/nt)?
0
gravatar for marongiu.luigi
4 months ago by
Germany, Mannheim, UMM
marongiu.luigi420 wrote:

Dear all,

I need to perform a large BLAST search and I am using blastn in remote from the terminal. However, this takes way too long to give an answer and I have been thinking of creating a local database to speed the analysis. How can I download the all nr/nt repository? I see there is one here for the RefSeq. Would be this good? would it be already indexed or shall I create the index with makeblastdb?

Thank you

ADD COMMENTlink modified 4 months ago by Fabio Marroni2.4k • written 4 months ago by marongiu.luigi420
2

Download BLAST Software and Databases

ADD REPLYlink written 4 months ago by WouterDeCoster42k

Thank you, but what files shall I get for the nr/nt? I understand I should use ./update_blastdb.pl --decompress ... but with what other parameters? From the manual I can see ./update_blastdb.pl --decompress swissprot but I am not interested in proteins, thus -- since to build the database the command is makeblastdb -in {input} -dbtype nucl I tried: ` $ perl ~/src/blast/bin/update_blastdb.pl --decompress nucl Connected to NCBI nucl not found, skipping. ``` So what would be the correct syntax?

ADD REPLYlink written 4 months ago by marongiu.luigi420
1
gravatar for Fabio Marroni
4 months ago by
Fabio Marroni2.4k
Italy
Fabio Marroni2.4k wrote:

You can use

wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz

wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz

These are fasta files, they are not indexed. You should use the makeblastdb command to index that.

You might also want to browse ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA and check what other databases are available.

ADD COMMENTlink written 4 months ago by Fabio Marroni2.4k

Thanks. Why two databases? shouldn't it be a single one nt/nr?

ADD REPLYlink written 4 months ago by marongiu.luigi420
1

Because nt is nucleotide and nr is protein sequences. Depending on kind of searches you want to do you will need to choose one.

Get the pre-formatted database files from ftp://ftp.ncbi.nih.gov/blast/db/. There is no point in trying to get the fasta files and make your own. You need to download all files with nt and nr in the name. Put them in one directory. Uncompress the files and that is all that should be needed.

Note: You will need tens of GB of RAM to do local searches against nt or nr.

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax74k

Thank you, this is clearer. And if I wanted to use update_blastdb.pl what would be the right syntax? would it be better than download manually?

ADD REPLYlink written 4 months ago by marongiu.luigi420
2
perl update_blastdb.pl --decompress nt
perl update_blastdb.pl --decompress nr

Using this method will download all chunks automatically without having get multiple tar files. Make sure you have enough space available locally.

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax74k

Thanks! it worked fine

ADD REPLYlink written 4 months ago by marongiu.luigi420

Yes, I think that genomax's suggestion is good. It makes no sense downloading fasta and making the db when you can download the formatted db!

ADD REPLYlink written 4 months ago by Fabio Marroni2.4k

Thank you, I used update_blastdb and managed to create the local database. Alas, the speed of search is not much better than in remote. Is not much problem of RAM but of processor's speed, I'd say. Perhaps I can speed up using a supercluster... anyway, the pipeline works.

ADD REPLYlink written 4 months ago by marongiu.luigi420

You could have a look at Diamond

ADD REPLYlink written 4 months ago by WouterDeCoster42k

Interesting! I'll look into it, thanks.

ADD REPLYlink written 4 months ago by marongiu.luigi420

Alas, the speed of search is not much better than in remote.

If you have access to a local cluster, using multiple threads/cores, reading the entire database index into memory should be fast. How much RAM did you allocate to the job and how many cores did you use?

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax74k

the desktop PC I am using has 64 Gb RAM and 16 threads. Can I assign RAM/threads on the blastn command directly? Otherwise, I am switching on the cluster and allocate resources using qsub.

ADD REPLYlink written 4 months ago by marongiu.luigi420

Did you look at the inline help for blastn command? If you did not specify num_threads then you likely used just one core. 64G may not be enough for nt/nr searches. I would move to the cluster.

ADD REPLYlink written 4 months ago by genomax74k

Yep, it is -num_threads integer. If the RAM is not enough, then cluster it is. Thanks

ADD REPLYlink written 4 months ago by marongiu.luigi420
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1863 users visited in the last hour