Question: Getting multiple *.sra files from NCBI using a list
gravatar for chrys
3.2 years ago by
chrys40 wrote:

Hi there folks, I am trying to download SRA files from a dataset I compiled from the NIH Roadmap Data Matrix. The problem is that the dataset spans about 50-60 files. I have little intend of downloading them by hand and I thought a quick wget would help but for some reason each link provided just holds another subdirectory and does not point to the locations of the actual *.sra files which makes it a lot harder to download them:



My script so far does this:

while read name files; do
    mkdir $name
    wget $files -P /$name/

done < List

I tried some different approaches like (and many others):

wget --no-parent -r -l1

My list just contains my sample names and the links which were provided by the data matrix. But wget seem to have problems accessing the subdirectory with the *.sra file without specifying the path explicitly.

If anybody has an idea on how to solve this I would be eternally grateful. Since at this time I probably would have been done downloading them by hand.

sra ncbi • 2.2k views
ADD COMMENTlink modified 2.7 years ago by Biostar ♦♦ 20 • written 3.2 years ago by chrys40

how about just using fastq-dump ? ?

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum130k

Well, it would work but I would still need to acquire the explicit identifier of every experiment since the GEO accession does not work and the compression of *.sra is quite high since I do not need them all at once but iteratively. Also unfortunately, the NIH Roadmap does not let me export the direct *.sra identifiers but only the mentioned subdirectories. Meaning that I would still look up the sra-IDs by hand correct ? Sorry if I overlooked something terribly obvious.

Short example:

Then I hit export, take the file and create my list from this. If I am doing something stupid please let me know. Thank you for your help.

ADD REPLYlink written 3.2 years ago by chrys40

See if getting them from EBI-ENA is less painful. You could get fastq files directly avoiding sratoolkit altogether.

ADD REPLYlink written 3.2 years ago by genomax90k

If the SRX files are in order, you could print all the wget commands through a loop and run them in the terminal

ADD REPLYlink written 2.7 years ago by vinayjrao180
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 910 users visited in the last hour