Question: download from SRA
2
gravatar for zh.khodadadi
3.2 years ago by
zh.khodadadi20
zh.khodadadi20 wrote:

how can i download a list of SRR accession from SRA by sratoolkit? what is the configuration list of SRR numbers?

rna-seq • 5.0k views
ADD COMMENTlink modified 4 weeks ago by MSF0 • written 3.2 years ago by zh.khodadadi20

Did you read the tutorial?

How to download raw sequence data from GEO/SRA

ADD REPLYlink written 18 months ago by YaGalbi1.4k

Sorry to bring up an old thread, but..

What is the difference between prefetch and fastq-dump?

From what I read, both will download the SRR filet, but one in SRA format while the other in fasq format? if so, what is SRA format? and if what I understood is wrong, please elaborate.

ADD REPLYlink written 4 weeks ago by MSF0

Check out this SRA Download guide from NCBI for answers to your questions.

ADD REPLYlink written 4 weeks ago by genomax74k

As I wrote in my comment "From what I read", as I was reading there already :).To me, it does not make sense to have prefetch, why add an extra step to get the data format you want, you just can fastq-dump whatever you want directly, correct? or am I missing something for prefetch?

ADD REPLYlink written 4 weeks ago by MSF0

The ‘prefetch’ utility in the SRA Toolkit can be used to download SRA data and any required reference sequences in a single operation.

For some datasets data may be uploaded as reference compressed files. In order to recreate original sequence data one needs to have the exact reference used for that compression. As line above indicates prefetch facilitates downloads of data/reference in one step.

If you do not use prefetch for such data then

you will then need to determine (1) if your downloaded dataset is reference-compressed, (2) if so, which references are required to access the data (see vdb-dump for an example of how to determine this), and (3) acquire the reference sequences manually.

Whenever possible you should avoid using SRA (except for datasets that need authorization) and download data in fastq format directly from EBI/ENA. Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by genomax74k
6
gravatar for st.ph.n
3.2 years ago by
st.ph.n2.5k
Philadelphia, PA
st.ph.n2.5k wrote:

Throw your SRR numbers into a file called SRR_list.txt, one number per line.

Then add this to a file called get_SRR_data.sh

   #!/usr/bin/bash

    fastq-dump --split-3 $1

and run on the command line with:

cat SRR_list.txt | xargs -n 1 bash get_SRR_data.sh

Fastq-dump will pull the data, one by one for all accesion numbers in your list, and turn each into a fastq at the same time. The --split-3 will create paired end files if available. Provide the path to fastq-dump in the bash script, if it is not installed globally on your system.

If you prefer @Satya's suggestion of using wget:

#!/usr/bin/bash

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/"$1"/"$1".sra

fastq-dump --split-3 "$1".sra
ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by st.ph.n2.5k
0
gravatar for Mike
3.2 years ago by
Mike1.4k
UK
Mike1.4k wrote:

have a look at @Obi Griffith previous post:

Determine the SRR number and then download the data at the command-line with:

prefetch -v SRR925811

How to download raw sequence data from GEO/SRA

ADD COMMENTlink written 3.2 years ago by Mike1.4k
0
gravatar for Satyajeet Khare
3.2 years ago by
Satyajeet Khare1.5k
Pune, India
Satyajeet Khare1.5k wrote:

I use wget to download

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR(first three digits)/SRR(all digits)/SRR(all digits).sra

and fastq-dump to convert to fastq

fastq-dump --split-3 SRR(all digits).sra
ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Satyajeet Khare1.5k

There's no need to pull the data, and then convert to fastq. fastq-dump will do both for you.

ADD REPLYlink written 3.2 years ago by st.ph.n2.5k

I agree, but wget with ftp is way faster, unless there is a way to use fastq-dump with ftp that I am not aware of.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Satyajeet Khare1.5k

As far as I know, sra can block the ip if you download a lot of files with wget.

ADD REPLYlink written 18 months ago by grant.hovhannisyan1.8k

In my experience, the fastest and the most secure (without connection interruptions) is to use prefetch with aspera, then convert sra files to fastq with fastq-dump. The whole thing saves a lot of time.

ADD REPLYlink written 18 months ago by grant.hovhannisyan1.8k
0
gravatar for Federico Giorgi
18 months ago by
Columbia University
Federico Giorgi540 wrote:

You can use xargs and the sra-toolkit prefetch to download every SRR id contained in a txt file list, like:

xargs -n1 prefetch < SRR_Acc_List.txt
ADD COMMENTlink modified 18 months ago • written 18 months ago by Federico Giorgi540

im using this but i got very weird error:

2018-11-14T08:47:00 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067578 ' cannot be found.

2018-11-14T08:47:01 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067621 ' cannot be found.

2018-11-14T08:47:01 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067637 ' cannot be found.

Can you please help?

ADD REPLYlink written 12 months ago by S AR50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1265 users visited in the last hour