Question: SRA files bulk downloads
1
gravatar for S AR
9 months ago by
S AR50
Pakistan
S AR50 wrote:

How do i use aspera or wget to download the SRA files in bulk either by RUN/Sample/Experiemts. My SRA ID list contains IDs from Exp (SRX) Run(SRR/ERR) and samples as well. I tried prefetch from sratoolkit:

prefetch --list ../XDR_169_ids.txt

XDR_169_ids.txt:

SRS551840
ERR688040
ERR688041
SRS551807
ERR688042
ERR688043
ERR688044
ERR688045
ERR688046
ERR688047
ERR688048
SRR1269497
(...)

But Prefetch was giving the following error:

2018-11-02T05:27:44 prefetch.2.8.2 warn: '../XDR_169_ids.txt' is invalid or not a kart file

I converted it to .table file also supported by prefetch because dont know what was the KART format. bUt it is giving same error so i used :

prefetch $(../XDR_169_ids.txt)

It gave the error will all ids im pasting few:

../XDR_169_ids.txt: line 157: $'ERR234622\r': command not found
../XDR_169_ids.txt: line 158: $'SRS551952\r': command not found
../XDR_169_ids.txt: line 159: $'SRR671794\r': command not found
../XDR_169_ids.txt: line 163: $'SRS552331\r': command not found

I tried:

prefetch ERR688040

again error:

2018-11-02T05:32:10 prefetch.2.8.2: 1) 'ERR688040' is found locally

Any suggestions? I have 4000 SRA IDs and i want to get it download with fastest speed i tried aspera but i dont know what should i write in the end where we give file name (i dont want to give each name in single command)

linux recursive aspera awk wget • 1.2k views
ADD COMMENTlink modified 9 months ago • written 9 months ago by S AR50

What is that ERR688040 ? Is that ID correct? and why don't you try .sh script with fastqdump.

ADD REPLYlink modified 9 months ago • written 9 months ago by k.kathirvel93200

@OP: I abridged the list of accessions a bit to improve readability.

ADD REPLYlink written 9 months ago by ATpoint21k
2
gravatar for ATpoint
9 months ago by
ATpoint21k
Germany
ATpoint21k wrote:

prefetch is indeed the way to go here. Prefetch uses aspera internally if you set it up properly. Here the manual.

The IDs with prefix SRR and ERR can be directly downloaded via prefetch SRR/ERR(...). The SRS accession number contains multiple experiments/runs, therefore you first have to get the SRR numbers from it.

Do it via Entrez Direct (available via conda) as suggested on Biostars previously. Example:

## Extract SRA/ERR:
esearch -db sra -query SRS551840 | efetch --format runinfo | cut -d ',' -f 1 | grep SRR 

## Output:
SRR1159129
SRR1159377
SRR1181071
SRR1181300

In your case, I would make a download list like:

##Extract SRR/ERR:
grep -E 'SRR|ERR' XDR_169_ids.txt > downloads.txt

## Find SRAs from SRS:
grep 'SRS' XDR_169_ids.txt | parallel "esearch -db sra -query {} | efetch --format runinfo | cut -d ',' -f 1 | grep SRR" >> downloads.txt

## Now make sure there are no duplicates, then download using GNU parallel to have 4 (or as many your disk can handle) streams in parallel:
sort -u downloads.txt | parallel -j 4 "prefetch {}"

Once you have the sra files, convert to fastq with parallel-fastq-dump.

2018-11-02T05:32:10 prefetch.2.8.2: 1) 'ERR688040' is found locally

That means that the file is already present at the download folder, so download of this one should be finished.

ADD COMMENTlink modified 9 months ago • written 9 months ago by ATpoint21k

Atpoint thats great i l try this . Thanku

ADD REPLYlink written 9 months ago by S AR50

Did it work for you?

ADD REPLYlink written 9 months ago by ATpoint21k

Hi ATpoint,

Sorry i was out of country to attend a conference and i tried it today.. And yes it did worked. It extracted me all those SRS ids. But when i tried:

sort -u downloads.txt | parallel -j 4 "prefetch {}"

Im getting the following error. Can you help me with this:

2018-11-13T04:24:10 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067743 ' cannot be found.

2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR117453 ' cannot be found.

2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR108480 ' cannot be found.

2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR117454 ' cannot be found.

2018-11-13T04:24:14 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR133854 ' cannot be found.

2018-11-13T04:24:14 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR133900 ' cannot be found.

2018-11-13T04:24:14 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR133890 ' cannot be found.
ADD REPLYlink modified 9 months ago • written 9 months ago by S AR50
0
gravatar for S AR
9 months ago by
S AR50
Pakistan
S AR50 wrote:

I tried to fetch one id:

prefetch ERR133900

After 15 mins or so it gave this log messages and i didnt find the ERR133900 file anywhere,:

2018-11-13T04:27:21 prefetch.2.8.2: 1) Downloading 'ERR133900'...
2018-11-13T04:27:21 prefetch.2.8.2:  Downloading via https...
2018-11-13T04:40:16 prefetch.2.8.2: 1) 'ERR133900' was downloaded successfully
2018-11-13T04:40:23 prefetch.2.8.2: 'ERR133900' has 1 unresolved dependency
2018-11-13T04:40:27 prefetch.2.8.2: 2) Downloading 'ncbi-acc:AL123456.2?vdb-ctx=refseq'...
2018-11-13T04:40:27 prefetch.2.8.2:  Downloading via https...
2018-11-13T04:40:30 prefetch.2.8.2: 2) 'ncbi-acc:AL123456.2?vdb-ctx=refseq' was downloaded successfully
2018-11-13T04:40:41 prefetch.2.8.2: 'ERR133900' has no remote vdbcache
ADD COMMENTlink written 9 months ago by S AR50

This means, that download has finished successfully. Please note that SRA files are not self contained. This particular SRA file comprises a mapping of the reads to reference sequence AL123456.2 (isolate H37Rv). This reference sequence was also downloaded to your local disc. The following command will dump the first two read pairs:

fastq-dump --split-spot -Z ERR133900 | head -8
ADD REPLYlink written 9 months ago by piet1.7k

oh.. But i want my other problem was i want to download a bulk in one go for which i was getting errors:

2018-11-13T04:24:10 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067743 ' cannot be found.

2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR117453 ' cannot be found.

2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR108480 ' cannot be found.

2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR117454 ' cannot be found.

like this.

For bulk download i used the command:

sort -u downloads.txt | parallel -j 4 "prefetch {}"

But as i mentioned above if im doing it manually it did downloaded but i dont know where? not showing in my folder

ADD REPLYlink written 9 months ago by S AR50

There is a whitespace behind your accessions 'ERR067743 ' instead of 'ERR067743'. Remove that.

ADD REPLYlink modified 9 months ago • written 9 months ago by ATpoint21k

I did removed it but still same error:

2018-11-14T08:47:00 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067578 ' cannot be found.

2018-11-14T08:47:01 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067621 ' cannot be found.
ADD REPLYlink written 9 months ago by S AR50
1

There is still a whitespace, don't you see that?

whitespace

The command itself is correct, you input file has flaws, try:

sort -u downloads.txt | awk '{gsub(" ", "", $1);print $1}' | parallel prefetch {}

When I download the files you indicate and artificially add a whitespace after the accession number, I get the same error. Removing it solves the issue. Means you still have whitespaces. There is also no point in refreshing older posts on prefetch downloads. It is simply your input file that is wrong.

ADD REPLYlink modified 9 months ago • written 9 months ago by ATpoint21k
1

...and? solved it?

ADD REPLYlink written 9 months ago by ATpoint21k

Ye kind of. I just have to break my list into 3 halves and than it is working but still it is missing few IDS but i can manage those few manually.

ADD REPLYlink written 9 months ago by S AR50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1807 users visited in the last hour