4
2
Entering edit mode
4.8 years ago

how can i download a list of SRR accession from SRA by sratoolkit? what is the configuration list of SRR numbers?

rna-seq • 8.3k views
0
Entering edit mode
0
Entering edit mode

Sorry to bring up an old thread, but..

What is the difference between prefetch and fastq-dump?

From what I read, both will download the SRR filet, but one in SRA format while the other in fasq format? if so, what is SRA format? and if what I understood is wrong, please elaborate.

0
Entering edit mode

0
Entering edit mode

As I wrote in my comment "From what I read", as I was reading there already :).To me, it does not make sense to have prefetch, why add an extra step to get the data format you want, you just can fastq-dump whatever you want directly, correct? or am I missing something for prefetch?

0
Entering edit mode

The ‘prefetch’ utility in the SRA Toolkit can be used to download SRA data and any required reference sequences in a single operation.

For some datasets data may be uploaded as reference compressed files. In order to recreate original sequence data one needs to have the exact reference used for that compression. As line above indicates prefetch facilitates downloads of data/reference in one step.

If you do not use prefetch for such data then

you will then need to determine (1) if your downloaded dataset is reference-compressed, (2) if so, which references are required to access the data (see vdb-dump for an example of how to determine this), and (3) acquire the reference sequences manually.

Whenever possible you should avoid using SRA (except for datasets that need authorization) and download data in fastq format directly from EBI/ENA. Fast download of FASTQ files from the European Nucleotide Archive (ENA)

8
Entering edit mode
4.8 years ago
st.ph.n ★ 2.6k

Throw your SRR numbers into a file called SRR_list.txt, one number per line.

Then add this to a file called get_SRR_data.sh

   #!/usr/bin/bash

fastq-dump --split-3 $1  and run on the command line with: cat SRR_list.txt | xargs -n 1 bash get_SRR_data.sh  Fastq-dump will pull the data, one by one for all accesion numbers in your list, and turn each into a fastq at the same time. The --split-3 will create paired end files if available. Provide the path to fastq-dump in the bash script, if it is not installed globally on your system. If you prefer @Satya's suggestion of using wget: #!/usr/bin/bash wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/"$1"/"$1".sra fastq-dump --split-3 "$1".sra

0
Entering edit mode
4.8 years ago
Mike ★ 1.7k

have a look at @Obi Griffith previous post:

Determine the SRR number and then download the data at the command-line with:

prefetch -v SRR925811


0
Entering edit mode
4.8 years ago
Satyajeet Khare ★ 1.6k

I use wget to download

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR(first three digits)/SRR(all digits)/SRR(all digits).sra


and fastq-dump to convert to fastq

fastq-dump --split-3 SRR(all digits).sra

0
Entering edit mode

There's no need to pull the data, and then convert to fastq. fastq-dump will do both for you.

0
Entering edit mode

I agree, but wget with ftp is way faster, unless there is a way to use fastq-dump with ftp that I am not aware of.

0
Entering edit mode

As far as I know, sra can block the ip if you download a lot of files with wget.

0
Entering edit mode

In my experience, the fastest and the most secure (without connection interruptions) is to use prefetch with aspera, then convert sra files to fastq with fastq-dump. The whole thing saves a lot of time.

0
Entering edit mode
3.1 years ago

You can use xargs and the sra-toolkit prefetch to download every SRR id contained in a txt file list, like:

xargs -n1 prefetch < SRR_Acc_List.txt

0
Entering edit mode

im using this but i got very weird error:

2018-11-14T08:47:00 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067578 ' cannot be found.

2018-11-14T08:47:01 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067621 ' cannot be found.

2018-11-14T08:47:01 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067637 ' cannot be found.