2
2
Entering edit mode
3.6 years ago
S AR ▴ 80

How do i use aspera or wget to download the SRA files in bulk either by RUN/Sample/Experiemts. My SRA ID list contains IDs from Exp (SRX) Run(SRR/ERR) and samples as well. I tried prefetch from sratoolkit:

prefetch --list ../XDR_169_ids.txt


XDR_169_ids.txt:

SRS551840
ERR688040
ERR688041
SRS551807
ERR688042
ERR688043
ERR688044
ERR688045
ERR688046
ERR688047
ERR688048
SRR1269497
(...)


But Prefetch was giving the following error:

2018-11-02T05:27:44 prefetch.2.8.2 warn: '../XDR_169_ids.txt' is invalid or not a kart file


I converted it to .table file also supported by prefetch because dont know what was the KART format. bUt it is giving same error so i used :

prefetch $(../XDR_169_ids.txt)  It gave the error will all ids im pasting few: ../XDR_169_ids.txt: line 157:$'ERR234622\r': command not found
../XDR_169_ids.txt: line 158: $'SRS551952\r': command not found ../XDR_169_ids.txt: line 159:$'SRR671794\r': command not found
../XDR_169_ids.txt: line 163: $'SRS552331\r': command not found  I tried: prefetch ERR688040  again error: 2018-11-02T05:32:10 prefetch.2.8.2: 1) 'ERR688040' is found locally  Any suggestions? I have 4000 SRA IDs and i want to get it download with fastest speed i tried aspera but i dont know what should i write in the end where we give file name (i dont want to give each name in single command) aspera wget recursive awk linux • 7.8k views ADD COMMENT 0 Entering edit mode What is that ERR688040 ? Is that ID correct? and why don't you try .sh script with fastqdump. ADD REPLY 0 Entering edit mode @OP: I abridged the list of accessions a bit to improve readability. ADD REPLY 5 Entering edit mode 3.6 years ago ATpoint 62k prefetch is indeed the way to go here. Prefetch uses aspera internally if you set it up properly. Here the manual. The IDs with prefix SRR and ERR can be directly downloaded via prefetch SRR/ERR(...). The SRS accession number contains multiple experiments/runs, therefore you first have to get the SRR numbers from it. Do it via Entrez Direct (available via conda) as suggested on Biostars previously. Example: ## Extract SRA/ERR: esearch -db sra -query SRS551840 | efetch --format runinfo | cut -d ',' -f 1 | grep SRR ## Output: SRR1159129 SRR1159377 SRR1181071 SRR1181300  In your case, I would make a download list like: ##Extract SRR/ERR: grep -E 'SRR|ERR' XDR_169_ids.txt > downloads.txt ## Find SRAs from SRS: grep 'SRS' XDR_169_ids.txt | parallel "esearch -db sra -query {} | efetch --format runinfo | cut -d ',' -f 1 | grep SRR" >> downloads.txt ## Now make sure there are no duplicates, then download using GNU parallel to have 4 (or as many your disk can handle) streams in parallel: sort -u downloads.txt | parallel -j 4 "prefetch {}"  Once you have the sra files, convert to fastq with parallel-fastq-dump. 2018-11-02T05:32:10 prefetch.2.8.2: 1) 'ERR688040' is found locally That means that the file is already present at the download folder, so download of this one should be finished. ADD COMMENT 0 Entering edit mode Atpoint thats great i l try this . Thanku ADD REPLY 0 Entering edit mode Did it work for you? ADD REPLY 0 Entering edit mode Hi ATpoint, Sorry i was out of country to attend a conference and i tried it today.. And yes it did worked. It extracted me all those SRS ids. But when i tried: sort -u downloads.txt | parallel -j 4 "prefetch {}"  Im getting the following error. Can you help me with this: 2018-11-13T04:24:10 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067743 ' cannot be found. 2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR117453 ' cannot be found. 2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR108480 ' cannot be found. 2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR117454 ' cannot be found. 2018-11-13T04:24:14 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR133854 ' cannot be found. 2018-11-13T04:24:14 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR133900 ' cannot be found. 2018-11-13T04:24:14 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR133890 ' cannot be found.  ADD REPLY 0 Entering edit mode 3.6 years ago S AR ▴ 80 I tried to fetch one id: prefetch ERR133900  After 15 mins or so it gave this log messages and i didnt find the ERR133900 file anywhere,: 2018-11-13T04:27:21 prefetch.2.8.2: 1) Downloading 'ERR133900'... 2018-11-13T04:27:21 prefetch.2.8.2: Downloading via https... 2018-11-13T04:40:16 prefetch.2.8.2: 1) 'ERR133900' was downloaded successfully 2018-11-13T04:40:23 prefetch.2.8.2: 'ERR133900' has 1 unresolved dependency 2018-11-13T04:40:27 prefetch.2.8.2: 2) Downloading 'ncbi-acc:AL123456.2?vdb-ctx=refseq'... 2018-11-13T04:40:27 prefetch.2.8.2: Downloading via https... 2018-11-13T04:40:30 prefetch.2.8.2: 2) 'ncbi-acc:AL123456.2?vdb-ctx=refseq' was downloaded successfully 2018-11-13T04:40:41 prefetch.2.8.2: 'ERR133900' has no remote vdbcache  ADD COMMENT 0 Entering edit mode This means, that download has finished successfully. Please note that SRA files are not self contained. This particular SRA file comprises a mapping of the reads to reference sequence AL123456.2 (isolate H37Rv). This reference sequence was also downloaded to your local disc. The following command will dump the first two read pairs: fastq-dump --split-spot -Z ERR133900 | head -8  ADD REPLY 0 Entering edit mode oh.. But i want my other problem was i want to download a bulk in one go for which i was getting errors: 2018-11-13T04:24:10 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067743 ' cannot be found. 2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR117453 ' cannot be found. 2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR108480 ' cannot be found. 2018-11-13T04:24:12 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR117454 ' cannot be found.  like this. For bulk download i used the command: sort -u downloads.txt | parallel -j 4 "prefetch {}" But as i mentioned above if im doing it manually it did downloaded but i dont know where? not showing in my folder ADD REPLY 0 Entering edit mode There is a whitespace behind your accessions 'ERR067743 ' instead of 'ERR067743'. Remove that. ADD REPLY 0 Entering edit mode I did removed it but still same error: 2018-11-14T08:47:00 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067578 ' cannot be found. 2018-11-14T08:47:01 prefetch.2.8.2 err: libs/vfs/resolver.c:3350:VResolverQueryPath: path not found while resolving tree within virtual file system module - 'ERR067621 ' cannot be found.  ADD REPLY 1 Entering edit mode There is still a whitespace, don't you see that? The command itself is correct, you input file has flaws, try: sort -u downloads.txt | awk '{gsub(" ", "",$1);print \$1}' | parallel prefetch {}


When I download the files you indicate and artificially add a whitespace after the accession number, I get the same error. Removing it solves the issue. Means you still have whitespaces. There is also no point in refreshing older posts on prefetch downloads. It is simply your input file that is wrong.

1
Entering edit mode

...and? solved it?

0
Entering edit mode

Ye kind of. I just have to break my list into 3 halves and than it is working but still it is missing few IDS but i can manage those few manually.