Downloading Multi Experiment .Sra Files From Ncbi Archive Automatedly
3
5
Entering edit mode
11.7 years ago
narges ▴ 210

Hi all, in order to do some comparisons, I need to download 161 raw dataset files from NCBI, below link: http://www.ncbi.nlm.nih.gov/sites/entrez?db=sra&LinkName=pubmed_sra&from_uid=20220758 I should save them first into cluster but I do not know what is the best way of downloading these files. Many thanks in advance for your help.

sra ncbi next-gen • 11k views
ADD COMMENT
7
Entering edit mode
11.7 years ago
matted 7.8k

Here's a similar answer, but maybe useful if (like me) you don't like working with .SRA files.

Find the run(s) in the EBI ENA (http://www.ebi.ac.uk/ena/data/view/SRP001540 for yours).

Then click the "View: Text" link which will download this file. The 17th column has the full FTP link for each file.

Like Sukhdeep's example, you could then run something like cut -f 17 SRP001540 | tail -n +2 | xargs wget.

The advantage of the ENA is that you can download FASTQ files directly, and skip the slow step of converting the .SRA files back to FASTQ.

ADD COMMENT
0
Entering edit mode

+1 good one for just fastq's

ADD REPLY
6
Entering edit mode
11.7 years ago

On the page you have given, select all the experiments you want, and then click on Send to:->File->Summary. A csv file will be downloaded for the experiments you selected with a link to the ftp.

Now, in your directory in cluster, make a folder and move the file in there. I assume the 14th column of that file is the ftp links, change as required in the following command.

sed 1d file | cut -f14 | wget -i -

This will download all the experiment archives. Use -b in the wget to send it to background.

Cheers

ADD COMMENT
0
Entering edit mode

millions of thanks . just one more thing, the information text file regarding this dataset is in the link : http://eqtl.uchicago.edu/RNA_Seq_data/list_lanes_pickrell_2010_nature if one search for for example NA19200 in this file there would be 4 results, 2 for each center(yale and argonne) , I have assumed that for this individual sample (NA19200), there are two technical replicates, one in argonne center and one in yale canter, but I can not understand what do they mean with this "2" after the name of this individual for the second sample in the same center. I mean why do they have NA19200 and NA192002 for this individual?

ADD REPLY
0
Entering edit mode

It's library replicates. From the supplement to their paper: "In the course of examining variability between libraries, multiple libraries were prepared and sequenced for a subset of cell lines."

ADD REPLY
3
Entering edit mode
11.7 years ago

If your workflow includes R, you might take a look at the Bioconductor SRAdb package. It has functions for searching SRA/ENA (both) locally, finding and generating URLs, and downloading from SRA including some simple functionality for scripting using aspera.

ADD COMMENT

Login before adding your answer.

Traffic: 1979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6