Best command line tool for downloading sequencing reads
3
0
Entering edit mode
11 months ago
blackadder ▴ 30

Hello there,

I am using enaBrowserTools (enaDataGet) for bulk downloading raw reads from ENA in a pipeline of mine.

The last few weeks I noticed that I am getting this error message a lot when I use this tool

Error with FTP transfer: <urlopen error 421 There are too many connected users, please try later.> .

I searched it a bit and apparently it has to do with a limit ENA set up that clogs up enaDataGet calling.

And that is an issue for me!

In the past I was using Fasterq-dump but I do not know if that the best option nowadays.

So I wanted to make this post in order to see if there are better alternatives and in general for people to discuss the available tools and their pros and cons!

Thanking you in advance.

enaBrowserTools reads Fasterq-dump sequencing • 1.3k views
ADD COMMENT
1
Entering edit mode
11 months ago
GenoMax 146k

At least two additional tools mentioned in comments linked below.

Download NCBI data using sratoolkit in anaconda - bio package from @Istvan
Access to fastq files on SRA Run browser - fastq-dl package

If any tools use the FTP interface under the covers then you will likely hit the same problem. If you are downloading hundreds of samples then all tools will likely encounter some issue in time.

ADD COMMENT
0
Entering edit mode

Hello and ty for the reply,

I dont think the SRA browser works for me because I look for command line tools. As for sratoolkit and more specifically fasterq-dump, I think it is a good tool and I have used it before! I was just wondering if there are other stuff out there!

ADD REPLY
0
Entering edit mode

Do not go on the tiles of the biostar posts that are automatically linked above by biostar code parsing the links. Actually click on the links and check the tools.

ADD REPLY
0
Entering edit mode

Oh you are right...I totally missed that! Thank you!

ADD REPLY
1
Entering edit mode
11 months ago

I've had good experiences recently with fetchngs: https://nf-co.re/fetchngs/1.10.1

nextflow run nf-core/fetchngs \
-profile singularity \
--input ids.csv \
--outdir <OUTDIR>

will download the relevant Singularity containers (or docker for Docker) and download ENA or SRA IDs in ids.csv, either as direct fastq from the ENA via curl or as SRA files, then fastq files via sratools. It will rerun failed download jobs.

ADD COMMENT
0
Entering edit mode

Oh i did not know about this! Thank you!

ADD REPLY
1
Entering edit mode
11 months ago
dsull ★ 6.9k

I typically find going to https://sra-explorer.info/ to be the easiest way to navigate the SRA and retrieve download links (which I can use wget or curl to download).

Some command-line tools (see the entire thread):

ADD COMMENT
0
Entering edit mode

Seconding sra-explorer.info -- the suggested nf-core pipeline will work, but I find it overly heavy for such a trivial task such as running a few wget lines.

ADD REPLY

Login before adding your answer.

Traffic: 1006 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6