Question: Is There A Faster Replacement For Fastq-Dump (From The Sra-Toolkit)?
gravatar for Devon Ryan
4.2 years ago by
Devon Ryan77k
Freiburg, Germany
Devon Ryan77k wrote:

I occasionally need to reprocess previously published datasets, often stored in the Short Read Archive. For the most part, I need the raw fastq files, so I use fastq-dump. While that gets the job done, it's annoyingly slow for what it's actually doing. Has anyone come across a program that can more quickly extract reads in fastq format from SRA files? While I could presumably write a faster program, I'd like to avoid reinventing the wheel if needed.

I should note that I'm aware that many datasets are available in fastq format via ENA, but unfortunately they all aren't.

sra fastq • 5.1k views
ADD COMMENTlink modified 29 days ago by ATpoint3.2k • written 4.2 years ago by Devon Ryan77k

Incidentally the way fastq-dump works has many other limitations - the way it handles the internet connection and its "security handshakes" (what that is I don't know) get in the way.

It is the only bioinformatics program so far that does not work on Bash on Windows! Think about that for a second. For my book I had to find a simple replacement for it and came up with the wonderdump a replacement for the network access of fastq-dump, it uses a plain and faster curl for that - it does indeed work much faster than the regular fastq-dump.

ADD REPLYlink modified 16 months ago • written 16 months ago by Istvan Albert ♦♦ 75k

Last I checked, Japan's SRA mirror has not yet moved over to the binrary SRA format yet so you can still grab fastqs off there. Might be a useful workaround

ADD REPLYlink written 4.2 years ago by Ying W3.8k

Oh the amount of time noticing that would have saved me! :P Good on the Japanese for so far avoiding the SRA format annoyance!

ADD REPLYlink written 4.2 years ago by Devon Ryan77k

honestly I really doubt it - but I would agree that this binary SRA format is a major PITA

ADD REPLYlink written 4.2 years ago by Istvan Albert ♦♦ 75k

I also doubt anyone has written a second method if this one works and is supported. Dumping from SRA isn't a task you have to do repeatedly and therefore is not a target for optimization. Try to use your fastest hard-drives, assuming IO is the bottleneck. Many Linux systems have a ram-drive on /dev/shm you should look into for vastly sped-up IO.

ADD REPLYlink written 4.2 years ago by karl.stamm3.2k

I/O is not a bottle-neck. In fact, I can invoke an instance for every core on my workstation and still not max I/O (or come very close for that matter).

ADD REPLYlink written 4.2 years ago by Devon Ryan77k

fastq-dump will never to be a good choice. download speed is not so fastq and always with some confusing problems, such as 2016-11-12T08:33:35 fastq-dump.2.7 err: item not found while constructing within virtual database module - the path 'SRR1286321' cannot be opened as database or table. I prefer to wget or curl.

ADD REPLYlink written 16 months ago by Shicheng Guo4.7k
gravatar for endrebak
19 months ago by
endrebak570 wrote:

sam-dump seems a lot faster for me:

sam-dump | head | grep -v '^@' | awk '{print "@"$1"\n"$10"\n+\n"$11}'

Or if your data is paired:

sam-dump <your_data> | grep -v '^@' | awk 'NR%2==1 {print "@"$1"\n"$10"\n+\n"$11}' > samplename_1.fastq 
sam-dump <your_data> | grep -v '^@' | awk 'NR%2==0 {print "@"$1"\n"$10"\n+\n"$11}' > samplename_2.fastq

( awk lines stolen from here: )

ADD COMMENTlink written 19 months ago by endrebak570
gravatar for sutturka
29 days ago by
sutturka110 wrote:

Please check my answer in this thread. It might be useful.

ADD COMMENTlink written 29 days ago by sutturka110
gravatar for ATpoint
29 days ago by
ATpoint3.2k wrote:

I found parallel-fastq-dump quiet useful, a wrapper from Renan Valieris that makes use of the -N and -X options of fastq-dump to convert multiple chunks of the SRA in parallel, merging them chunks after sucessful conversion into the final fastq. It requires python3 and worked well for in my hands. Easy install with conda: conda install parallel-fastq-dump

ADD COMMENTlink written 29 days ago by ATpoint3.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 518 users visited in the last hour