Downloading NCBI Blast nt database
2
0
Entering edit mode
13 months ago
rgray • 0

I would like to download the full nt database to run a BLAST search on a large genome through HPC. I have installed BLAST using conda and set up accordingly.

When I run the step below (edited from the manual (https://www.ncbi.nlm.nih.gov/books/NBK569850/) for the nr database):

~/miniconda3/envs/blast/bin/update_blastdb.pl --passive --decompress nt

I get the error:

**Connected to NCBI
Downloading nt (90 volumes) ...  
Downloading nt.00.tar.gz...corrupt download, trying again.
Downloading nt.00.tar.gz...corrupt download, trying again.
too many failures, aborting download!**

Has anyone else encountered this issue? Thanks.

nucleotide blast nt genomics database • 2.6k views
ADD COMMENT
1
Entering edit mode

If you have an intrusion prevention/detection device and/or a firewall between the server and internet then this could be causing the problem you see. You will need to make sure your connections are allowed through.

ADD REPLY
0
Entering edit mode

Thanks for your reply. I thought the --passive option should get round this (--passive Use passive FTP, useful when behind a firewall or working in the cloud) but it still doesn't allow. Any advice would be great

ADD REPLY
0
Entering edit mode

See my comment below. If @Istvan's solution works for you then great.

ADD REPLY
2
Entering edit mode
13 months ago

You can always download the files from:

it is fairly simple to automate as well.

seq -w 00 89 | parallel wget https://ftp.ncbi.nlm.nih.gov/blast/db/nt.{}.tar.gz
ADD COMMENT
0
Entering edit mode

If a deep packet inspection device is causing this problem then the only way around it to get those network flows whitelisted.

ADD REPLY
0
Entering edit mode

I have still had no luck, so suspect it could be a firewall issue with my hpc server. I am able to download files using wget, however when I try to extract the files (tar -zxvpf nt.89.tar.gz) to use them I get the error:

nt.nal
nt.89.nin
nt.89.nhr
nt.89.nsq

gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

I will update what works if I am able to change the firewall permissions.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks, I will give this a go too!

ADD REPLY
1
Entering edit mode
13 months ago
size_t ▴ 120

try this :

ascp -T -l 200M -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh --host=ftp.ncbi.nih.gov --user=anonftp --mode=recv /blast/db/FASTA/nt.gz ./

ref:ascp ncbi

ADD COMMENT

Login before adding your answer.

Traffic: 2274 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6