11 days ago
rgray • 0

I would like to download the full nt database to run a BLAST search on a large genome through HPC. I have installed BLAST using conda and set up accordingly.

When I run the step below (edited from the manual (https://www.ncbi.nlm.nih.gov/books/NBK569850/) for the nr database):

~/miniconda3/envs/blast/bin/update_blastdb.pl --passive --decompress nt

I get the error:

**Connected to NCBI


Has anyone else encountered this issue? Thanks.

nucleotide blast nt genomics database • 590 views
If you have an intrusion prevention/detection device and/or a firewall between the server and internet then this could be causing the problem you see. You will need to make sure your connections are allowed through.

Thanks for your reply. I thought the --passive option should get round this (--passive Use passive FTP, useful when behind a firewall or working in the cloud) but it still doesn't allow. Any advice would be great

See my comment below. If @Istvan's solution works for you then great.

11 days ago

it is fairly simple to automate as well.

seq -w 00 89 | parallel wget https://ftp.ncbi.nlm.nih.gov/blast/db/nt.{}.tar.gz

If a deep packet inspection device is causing this problem then the only way around it to get those network flows whitelisted.

I have still had no luck, so suspect it could be a firewall issue with my hpc server. I am able to download files using wget, however when I try to extract the files (tar -zxvpf nt.89.tar.gz) to use them I get the error:

nt.nal
nt.89.nin
nt.89.nhr
nt.89.nsq

gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now


I will update what works if I am able to change the firewall permissions.

Thanks, I will give this a go too!

10 days ago
size_t ▴ 60

try this ：

ascp -T -l 200M -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh --host=ftp.ncbi.nih.gov --user=anonftp --mode=recv /blast/db/FASTA/nt.gz ./


ref：ascp ncbi