Question: Getting multiple *.sra files from NCBI using a list
0
gravatar for chrys
4 months ago by
chrys30
Germany
chrys30 wrote:

Hi there folks, I am trying to download SRA files from a dataset I compiled from the NIH Roadmap Data Matrix. The problem is that the dataset spans about 50-60 files. I have little intend of downloading them by hand and I thought a quick wget would help but for some reason each link provided just holds another subdirectory and does not point to the locations of the actual *.sra files which makes it a lot harder to download them:

List:

Sample1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX099/SRX099571
Sample2 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX040/SRX040594

My script so far does this:

while read name files; do
    mkdir $name
    wget $files -P /$name/

done < List

I tried some different approaches like (and many others):

wget --no-parent -r -l1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX099/SRX099571/

My list just contains my sample names and the links which were provided by the data matrix. But wget seem to have problems accessing the subdirectory with the *.sra file without specifying the path explicitly.

If anybody has an idea on how to solve this I would be eternally grateful. Since at this time I probably would have been done downloading them by hand.

sra ncbi • 301 views
ADD COMMENTlink written 4 months ago by chrys30
1

how about just using fastq-dump ? https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump ?

ADD REPLYlink written 4 months ago by Pierre Lindenbaum101k

Well, it would work but I would still need to acquire the explicit identifier of every experiment since the GEO accession does not work and the compression of *.sra is quite high since I do not need them all at once but iteratively. Also unfortunately, the NIH Roadmap does not let me export the direct *.sra identifiers but only the mentioned subdirectories. Meaning that I would still look up the sra-IDs by hand correct ? Sorry if I overlooked something terribly obvious.

Short example: https://www.ncbi.nlm.nih.gov/geo/roadmap/epigenomics/?view=samples&sample=CD14%20primary%20cells

Then I hit export, take the file and create my list from this. If I am doing something stupid please let me know. Thank you for your help.

ADD REPLYlink written 4 months ago by chrys30

See if getting them from EBI-ENA is less painful. You could get fastq files directly avoiding sratoolkit altogether.

ADD REPLYlink written 4 months ago by genomax37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 879 users visited in the last hour