Question

Efficient way to download SRA dataset

0

Entering edit mode

2.1 years ago

melissachua90 ▴ 70

I want to download the SRP325386 dataset for a class deep sequencing analysis project. It contains 199898 samples.

I used the following command:

esearch -db sra -query SRP325386 | efetch -format runinfo | cut -d ',' -f 1 | grep SRR | xargs -n 1 -P 4 fastq-dump --split-3 --gzip --skip-technical --readids -W --read-filter pass

After 12 hours, it's still downloading (5152 items thus far).

Did I use the wrong command? Is there a more efficient way to download datasets from SRA?

sra • 677 views

ADD COMMENT • link updated 2.1 years ago by Matthias Zepper 4.6k • written 2.1 years ago by melissachua90 ▴ 70

0

Entering edit mode

You could try to replace fastq-dump with fasterq-dump. Alternatively, have a look at nf-core fetchngs, which makes the download even more convenient, because it parallelizes the downloading whereas your script will download sequentially.

ADD REPLY • link 2.1 years ago by Matthias Zepper 4.6k

0

Entering edit mode

Thanks! Another trivial question. What constitutes a SRA "Dataset"? Is it denoted with a "SRP" prefix?

ADD REPLY • link 2.1 years ago by melissachua90 ▴ 70

0

Entering edit mode

SRA is the Sequence Read Archive operated by the NCBI. There are projects (SRP), runs (SRR), experiments (SRX) and samples (SAMN). Typically, one will want to download all the data of a study/project, but also arbitrary subsets of those are possible.

fetchngs should be able to resolve whatever SRA ID it is provided with and download the accompanying data. Getting started is initially a bit more work, but once you know how to run nextflow and the nf-core pipelines, they will be a huge time-saver plus you usually get results according to the best practices.

ADD REPLY • link 2.1 years ago by Matthias Zepper 4.6k

0

Entering edit mode

You have a huge number of samples. Are you sure you actually want to download them all? I hope you have enough storage/bandwidth locally since you may run out of one of them before SRA does :-)

ADD REPLY • link 2.1 years ago by GenoMax 142k