How To Download All Sra Samples At Once ?
3
13
Entering edit mode
7.1 years ago
biorepine ★ 1.5k

Dear Biostars,

As you may know SRA is a repository for all types of sequencing data. I often times have to do manual download by copying links of every SRA dataset by hand and use wget. I am wondering is there any simplest approach than manual copying of links ? Thanx in advance

For ex: How can I download all the data related to SRP026197 ? http://www.ncbi.nlm.nih.gov/sra?term=SRP026197

1
Entering edit mode

Have you tried the SRAdb package from bioconductor? It's been a while, but I think it can be used to do that sort of thing.

0
Entering edit mode

Actually, SRA is the repository for sequence data, not GEO. There are links between the two databases, but your question is actually related to SRA.

0
Entering edit mode

oh yeah you are right. I will edit my question. thanx

0
Entering edit mode
0
Entering edit mode

when I run the code on my computer,I have a problem below,what is wrong?

trying URL 'http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz' Content type 'application/x-gzip' length 1308358823 bytes (1247.7 Mb) opened URL downloaded 1247.7 Mb

Unzipping...

Error in .local(drv, ...) : Could not connect to database: unable to open database file

0
Entering edit mode

Perhaps you ran out of space in /tmp or the equivalent. Anyway, please post things like this as new questions.

33
Entering edit mode
7.1 years ago

In R:

source('http://bioconductor.org/biocLite.R')
con = dbConnect('SQLite',srafile)


Now we are ready to query the local SQLite database:

listSRAfile('SRP026197',con)


Results in:

        study    sample experiment       run                                                                                                           ftp
1   SRP026197 SRS449410  SRX311638 SRR913951 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311638/SRR913951/SRR913951.sra
2   SRP026197 SRS449476  SRX311704 SRR914066 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311704/SRR914066/SRR914066.sra
3   SRP026197 SRS449408  SRX311636 SRR913949 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311636/SRR913949/SRR913949.sra
....
247 SRP026197 SRS449508  SRX311735 SRR914158 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311735/SRR914158/SRR914158.sra
248 SRP026197 SRS449460  SRX311688 SRR914006 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311688/SRR914006/SRR914006.sra
249 SRP026197 SRS449509  SRX311736 SRR914160 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311736/SRR914160/SRR914160.sra


If you simply want to have R do the downloads for you, that is also straightforward:

getSRAfile('SRP026197',con,fileType='sra')


If you have access to the aspera client command line utility, ascp, you can have R use it instead of ftp, resulting in much greater download speeds. See the help for getSRAfile for details.

7
Entering edit mode

In my case, the solution above worked with some modifications - I had to install and load the DBI package first and then change the dbConnect line:

source('http://bioconductor.org/biocLite.R')
biocLite('DBI')
library(DBI)
con = dbConnect(RSQLite::SQLite(), srafile)
listSRAfile('SRP026197', con)


Without these modifications I got the message "Error: unable to find an inherited method for function 'dbConnect' for signature '"character"'".

0
Entering edit mode

hi .I use these codes But I have Problem :

biocLite('DBI')

library(DBI)

con = dbConnect(RSQLite::SQLite(), srafile)

listSRAfile('SRP026197', con)

after Downloading I have this error Error in result_create(conn@ptr, statement) : database disk image is malformed

What should I do??

0
Entering edit mode

Hi, it is working great! However, I couldn't find a way to retrieve the information (ex: A specific tissue RNA-Seq) that related to specific SRA number. They are usually marked by GSE ids rather than SRA ids. Any suggestions would be appreciated!

0
Entering edit mode

You can use GEOmetadb to access NCBI GEO information in a similar way as for SRA data and SRAdb.

0
Entering edit mode

Yes but I already downloaded and processed large number of SRA samples. All I want to do is rename them with proper GEOid. I didn't see any information on this in either of the packages :(

0
Entering edit mode

This comes a bit late, but you might want to try something like this:

library(GEOquery)
gse <- getGEO('GSE48138') # retrieves a GEO list set for your SRA id.
## see what is in there:
show(gse)
# There are 2 sets of samples for that ID
##  what you want is table a with SRR to download and some sample information:
## lets see what the first set contains:
df <- as.data.frame(gse[[1]])


The table above contains loads of information regarding the samples/files, IDs, ect. You will have to see what interests you, and use it to rename the files. I hope it helps.

0
Entering edit mode

Hello there!

I am trying to extract the following SRA accession numbers with Bioconductor v3.1:

"SRP041432","ERP010058","SRP032486","SRP048789","SRP016517","ERP010240","SRP042345","SRP050383","SRP039499","SRP024388","SRP039009","SRP040131","SRP010723","ERP010570","SRP045342","ERP002340","ERP003677","SRP040950".

However, by running

getSRAfile(in_acc = c("SRP041432","ERP010058","SRP032486","SRP048789","SRP016517","ERP010240","SRP042345","SRP050383","SRP039499","SRP024388","SRP039009","SRP040131","SRP010723","ERP010570","SRP045342","ERP002340","ERP003677","SRP040950"), sra_con = sra_con,

+            destDir = getwd(), fileType = 'sra', srcType='ftp')

I get error messages due to specific files, which I later confirm are available for download in SRAdownload, for example…

The error message:

Error in download.file(i, destfile = file.path(destDir, basename(i)),  :

Am I doing anything wrong?

0
Entering edit mode

srafile = getSRAdbFile() trying URL 'http://dl.dropbox.com/u/51653511/SRAmetadb.sqlite.gz' Error in download.file(url_sra, destfile = localfile, mode = "wb", method = method) : cannot open URL 'http://dl.dropbox.com/u/51653511/SRAmetadb.sqlite.gz'

0
Entering edit mode

How to do this in R for controlled access data hosted at dbGaP if we have the key file rather than using prefetch/fastq-dump?

13
Entering edit mode
2.8 years ago

A non-R solution is to use the SRA toolkit prefetch command on a list of SRA identifiers.

First you need the file list. You can batch download it. In your case, go to https://www.ncbi.nlm.nih.gov/sra?term=SRP026197 Top-right, click to "Send To", "File", "Accession List".

Once you have it saved in a file (default is SraAccList.txt) you can use the command (tested in SRA toolkit 2.9.0):

prefetch $(<SraAccList.txt)  The .sra files will be downloaded in the default SRA folder. You can change with this trick: echo '/repository/user/main/public/root = "/path/to/download"' >$HOME/.ncbi/user-settings.mkfg

4
Entering edit mode

This is brilliant! It also works for fastq-dump:

fastq-dump --split-3 --gzip $(</path_to/SRR_Acc_List.txt)  ADD REPLY 0 Entering edit mode HI I tried, it doesn't work out for me. I had 976 files to be downloaded. SRA Study:SRP130211 But, I'm able to download each SRR file separately ./prefetch$(/home/data/yellow/SRR_Acc_List.txt) SRR6483251: command not found SRR6483252: command not found

0
Entering edit mode

HI I tried it doesn't workout for me. I had 976 files to be downloaded. SRA Study:SRP130211

1
Entering edit mode
3.1 years ago
vr ▴ 10

If you have a GSE accession, you can give this a try: https://github.com/pepkit/geofetch

The most important precondition is proper configuration of where you'd like the raw .sra files to be downloaded. You can also set some environment variables (that are mentioned in the command-line help for the tool geofetch.py) that will facilitate straightforward use. It can be as simple as something like:

/path/to/geofetch.py -i [GSE accession]