Efficient Bulk Data Retrieval from NCBI BioProject
0
0
Entering edit mode
5 months ago
George ▴ 10

Hello,

A month ago, I utilized the SRA Toolkit Pipeline to download Fastq files from a BioProject accession. Following the recommended steps, I generated a list of SRR Names, used prefetch, and then employed fasterq-dump (using parallel-fastq-dump) to obtain the data locally, resulting in fq.gz files with the corresponding SRR names.

Recently, while composing a review for my project, I attempted prefetch with the BioProject accession name. Surprisingly, it not only worked but also downloaded the files in fq.gz format, a task that prefetch supposedly cannot perform. Furthermore, it downloaded the files using the original project ID names(as used in the paper, as opposed to SRR names). I am puzzled by this unexpected behavior and would appreciate any insights into why this occurred.

ncbi SRAtoolkit prefetch • 685 views
ADD COMMENT
0
Entering edit mode

Anecdotal evidence is hard to comment on. Give a precise code example for reproduction.

ADD REPLY
0
Entering edit mode

Hey sorry if my post was not adequate. For retrieving all fq.gz data I just used prefetch PRJNA393611

ADD REPLY
1
Entering edit mode

Were you using two separate versions of sratoolkit at the two times? Functionality is routinely added with newer versions. Additional command line options may have been added to change the default behavior. There can be many explanations.

ADD REPLY
0
Entering edit mode

I guess it was something they added recently and there is no documentation for this, although I find it peculiar that the downloaded date lacks the dataset name and instead displays the names added by the authors. Regardless, I hope that in the future, someone discovers this post and opts to execute prefetch using the library name, bypassing the need for a script to retrieve all SRRs as I did:

for i in {344..819}; do prefetch SRX3057$i; done

To answer your question, no the version was the same but I didn't know prefetch could download all fastq with just the library name. Anyway, thank you for your insights!

ADD REPLY

Login before adding your answer.

Traffic: 2811 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6