Hello I need to import to my server the following dataset:
https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=ERP001736
it consist of 492 fastq files.
I know that I can go to the ERR files thar are available here: https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=ERP001736&o=acc_s%3Aa
and use fastq-dump, but this seems to be the raw data and making quality control over such data can be a waste of time given the fact that I can get the filtered reads into my server directly.
so how can I load such data iteratively using wget
or curl
?
additionaly, it can be the situation in which the server's internet falls during the importation. is there a way that wget
or curl
can retake the download process if something like that happens?
Thanks for reading :)
I assume since the files on first link have the word
clean
in them you think they are already processed? You could look at the source code for that page (right-click on page --> "view page source" or similar depending on your browser) and parse out all links that have theftp.sra.ebi.ac.uk
URL's. Then usewget
and a file with these URL's in to download the data.