How can I speed up wget from UniProt for UniRef90/50 fasta?
1
0
Entering edit mode
11 months ago
O.rka ▴ 710

How can I speed up the download here? https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz

I'm running wget on our compute servers, on the head node, and my personal computer which I know has a fast internet I/O but it's VERY slow taking days to download by the estimates.

How can I speed up this download process?

wget download uniprot uniref • 1.7k views
ADD COMMENT
0
Entering edit mode
11 months ago
Mensur Dlakic ★ 27k

I think you need a program that can create multiple download streams. Here is one example:

https://github.com/aria2/aria2

With aria2, a week ago I downloaded a compressed UniProt90 file in ~12 hours.

ADD COMMENT
0
Entering edit mode

Does it need to run multi-threaded?

ADD REPLY
0
Entering edit mode

It doesn't run multithreaded. It simply opens multiple download connections (something like --max-connection-per-server=5).

ADD REPLY
0
Entering edit mode

A few questions:

  • What can I use for a dropin replcement with this command? wget -v -P ${DATABASE_DIRECTORY} https://data.gtdb.ecogenomic.org/releases/release207/207.0/auxillary_files/gtdbtk_r207_v2_data.tar.gz
  • How can I specify the maximum amount of connections the server will allow?
ADD REPLY
0
Entering edit mode

For someone of your experience in this field, I think it is lazy to keep asking these types of questions. I already answered your original question, and the rest comes down to typing aria2c -h and going through the options. Nobody can tell you without testing how many connections any given server will allow.

ADD REPLY
0
Entering edit mode

Touché. I'll look into it. Do you usually use --max-connection-per-server=5?

ADD REPLY
0
Entering edit mode

I typically use 4 or 5.

ADD REPLY

Login before adding your answer.

Traffic: 1340 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6