Is There A Faster Replacement For Fastq-Dump (From The Sra-Toolkit)?
4
19
Entering edit mode
10.5 years ago

I occasionally need to reprocess previously published datasets, often stored in the Short Read Archive. For the most part, I need the raw fastq files, so I use fastq-dump. While that gets the job done, it's annoyingly slow for what it's actually doing. Has anyone come across a program that can more quickly extract reads in fastq format from SRA files? While I could presumably write a faster program, I'd like to avoid reinventing the wheel if needed.

I should note that I'm aware that many datasets are available in fastq format via ENA, but unfortunately they all aren't.

fastq sra • 18k views
ADD COMMENT
4
Entering edit mode

Incidentally the way fastq-dump works has many other limitations - the way it handles the internet connection and its "security handshakes" (what that is I don't know) get in the way.

It is the only bioinformatics program so far that does not work on Bash on Windows! Think about that for a second. For my book I had to find a simple replacement for it and came up with the wonderdump a replacement for the network access of fastq-dump, it uses a plain and faster curl for that - it does indeed work much faster than the regular fastq-dump.

http://data.biostarhandbook.com/scripts/wonderdump.sh

ADD REPLY
2
Entering edit mode

Last I checked, Japan's SRA mirror has not yet moved over to the binrary SRA format yet so you can still grab fastqs off there. Might be a useful workaround

ADD REPLY
0
Entering edit mode

Oh the amount of time noticing that would have saved me! :P Good on the Japanese for so far avoiding the SRA format annoyance!

ADD REPLY
1
Entering edit mode

honestly I really doubt it - but I would agree that this binary SRA format is a major PITA

ADD REPLY
0
Entering edit mode

I also doubt anyone has written a second method if this one works and is supported. Dumping from SRA isn't a task you have to do repeatedly and therefore is not a target for optimization. Try to use your fastest hard-drives, assuming IO is the bottleneck. Many Linux systems have a ram-drive on /dev/shm you should look into for vastly sped-up IO.

ADD REPLY
2
Entering edit mode

I/O is not a bottle-neck. In fact, I can invoke an instance for every core on my workstation and still not max I/O (or come very close for that matter).

ADD REPLY
0
Entering edit mode

fastq-dump will never to be a good choice. download speed is not so fastq and always with some confusing problems, such as 2016-11-12T08:33:35 fastq-dump.2.7 err: item not found while constructing within virtual database module - the path 'SRR1286321' cannot be opened as database or table. I prefer to wget or curl.

ADD REPLY
2
Entering edit mode
6.4 years ago
ATpoint 83k

I found parallel-fastq-dump quiet useful, a wrapper from Renan Valieris that makes use of the -N and -X options of fastq-dump to convert multiple chunks of the SRA in parallel, merging them chunks after sucessful conversion into the final fastq. It requires python3 and worked well for in my hands. Easy install with conda: conda install parallel-fastq-dump

Or simply get data directly in fastq format: Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD COMMENT
0
Entering edit mode

Indeed. Link fort direct download of fastq files from ENA archive generated via sra-explorer gives me download speed ~ 15MB/s which is orders of magnitude faster than fastq-dump.

ADD REPLY
0
Entering edit mode
8.0 years ago
endrebak ▴ 970

sam-dump seems a lot faster for me:

sam-dump ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP038/SRP038893/SRR1509032/SRR1509032.sra | head | grep -v '^@' | awk '{print "@"$1"\n"$10"\n+\n"$11}'

Or if your data is paired:

sam-dump <your_data> | grep -v '^@' | awk 'NR%2==1 {print "@"$1"\n"$10"\n+\n"$11}' > samplename_1.fastq 
sam-dump <your_data> | grep -v '^@' | awk 'NR%2==0 {print "@"$1"\n"$10"\n+\n"$11}' > samplename_2.fastq

( awk lines stolen from here: http://www.cureffi.org/2013/07/04/how-to-convert-sam-to-fastq-with-unix-command-line-tools/ )

ADD COMMENT
0
Entering edit mode
6.4 years ago
sutturka ▴ 190

Please check my answer in this thread. It might be useful.

ADD COMMENT
0
Entering edit mode
5.1 years ago
sschmeier ▴ 120

Old thread but have a look a fasterq-dump: https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump

ADD COMMENT

Login before adding your answer.

Traffic: 2945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6