Question: Getting multiple *.sra files from NCBI using a list
gravatar for chrys
4 months ago by
chrys30 wrote:

Hi there folks, I am trying to download SRA files from a dataset I compiled from the NIH Roadmap Data Matrix. The problem is that the dataset spans about 50-60 files. I have little intend of downloading them by hand and I thought a quick wget would help but for some reason each link provided just holds another subdirectory and does not point to the locations of the actual *.sra files which makes it a lot harder to download them:



My script so far does this:

while read name files; do
    mkdir $name
    wget $files -P /$name/

done < List

I tried some different approaches like (and many others):

wget --no-parent -r -l1

My list just contains my sample names and the links which were provided by the data matrix. But wget seem to have problems accessing the subdirectory with the *.sra file without specifying the path explicitly.

If anybody has an idea on how to solve this I would be eternally grateful. Since at this time I probably would have been done downloading them by hand.

sra ncbi • 301 views
written 4 months ago by chrys30

how about just using fastq-dump ? ?

written 4 months ago by Pierre Lindenbaum101k

Well, it would work but I would still need to acquire the explicit identifier of every experiment since the GEO accession does not work and the compression of *.sra is quite high since I do not need them all at once but iteratively. Also unfortunately, the NIH Roadmap does not let me export the direct *.sra identifiers but only the mentioned subdirectories. Meaning that I would still look up the sra-IDs by hand correct ? Sorry if I overlooked something terribly obvious.

Short example:

Then I hit export, take the file and create my list from this. If I am doing something stupid please let me know. Thank you for your help.

written 4 months ago by chrys30

See if getting them from EBI-ENA is less painful. You could get fastq files directly avoiding sratoolkit altogether.

written 4 months ago by genomax37k
