Question: How To Download All Sra Samples At Once ?
8
gravatar for biorepine
3.5 years ago by
biorepine1.2k
Spain
biorepine1.2k wrote:

Dear Biostars,

As you may know SRA is a repository for all types of sequencing data. I often times have to do manual download by copying links of every SRA dataset by hand and use wget. I am wondering is there any simplest approach than manual copying of links ? Thanx in advance

For ex: How can I download all the data related to SRP026197 ? http://www.ncbi.nlm.nih.gov/sra?term=SRP026197

geo sra download • 16k views
ADD COMMENTlink modified 16 months ago by Ada0 • written 3.5 years ago by biorepine1.2k
1

Have you tried the SRAdb package from bioconductor? It's been a while, but I think it can be used to do that sort of thing.

ADD REPLYlink written 3.5 years ago by Devon Ryan70k

Actually, SRA is the repository for sequence data, not GEO. There are links between the two databases, but your question is actually related to SRA.

ADD REPLYlink written 3.5 years ago by Sean Davis23k

oh yeah you are right. I will edit my question. thanx

ADD REPLYlink written 3.5 years ago by biorepine1.2k

here is another solution A: How to download raw sequence data from GEO/SRA

ADD REPLYlink written 2.9 years ago by Istvan Albert ♦♦ 73k
20
gravatar for Sean Davis
3.5 years ago by
Sean Davis23k
National Institutes of Health, Bethesda, MD
Sean Davis23k wrote:

In R:

source('http://bioconductor.org/biocLite.R')
biocLite('SRAdb')
library(SRAdb)
srafile = getSRAdbFile()
con = dbConnect('SQLite',srafile)

Now we are ready to query the local SQLite database:

listSRAfile('SRP026197',con)

Results in:

        study    sample experiment       run                                                                                                           ftp
1   SRP026197 SRS449410  SRX311638 SRR913951 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311638/SRR913951/SRR913951.sra
2   SRP026197 SRS449476  SRX311704 SRR914066 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311704/SRR914066/SRR914066.sra
3   SRP026197 SRS449408  SRX311636 SRR913949 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311636/SRR913949/SRR913949.sra
....
247 SRP026197 SRS449508  SRX311735 SRR914158 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311735/SRR914158/SRR914158.sra
248 SRP026197 SRS449460  SRX311688 SRR914006 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311688/SRR914006/SRR914006.sra
249 SRP026197 SRS449509  SRX311736 SRR914160 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX311/SRX311736/SRR914160/SRR914160.sra

If you simply want to have R do the downloads for you, that is also straightforward:

getSRAfile('SRP026197',con,fileType='sra')

If you have access to the aspera client command line utility, ascp, you can have R use it instead of ftp, resulting in much greater download speeds. See the help for getSRAfile for details.

ADD COMMENTlink written 3.5 years ago by Sean Davis23k
5

In my case, the solution above worked with some modifications - I had to install and load the DBI package first and then change the dbConnect line:

source('http://bioconductor.org/biocLite.R')
biocLite('SRAdb')
library(SRAdb)
biocLite('DBI')
library(DBI)
srafile = getSRAdbFile()
con = dbConnect(RSQLite::SQLite(), srafile)
listSRAfile('SRP026197', con)

Without these modifications I got the message "Error: unable to find an inherited method for function 'dbConnect' for signature '"character"'".

ADD REPLYlink written 2.1 years ago by adumitri50

Hi, it is working great! However, I couldn't find a way to retrieve the information (ex: A specific tissue RNA-Seq) that related to specific SRA number. They are usually marked by GSE ids rather than SRA ids. Any suggestions would be appreciated! 

ADD REPLYlink written 2.8 years ago by biorepine1.2k

You can use GEOmetadb to access NCBI GEO information in a similar way as for SRA data and SRAdb.

ADD REPLYlink written 2.8 years ago by Sean Davis23k

Yes but I already downloaded and processed large number of SRA samples. All I want to do is rename them with proper GEOid. I didn't see any information on this in either of the packages :(

ADD REPLYlink written 2.8 years ago by biorepine1.2k

This comes a bit late, but you might want to try something like this:

library(GEOquery)
gse <- getGEO('GSE48138') # retrieves a GEO list set for your SRA id.
## see what is in there:
show(gse)
# There are 2 sets of samples for that ID
##  what you want is table a with SRR to download and some sample information:
## lets see what the first set contains:
df <- as.data.frame(gse[[1]])
head(df)

The table above contains loads of information regarding the samples/files, IDs, ect. You will have to see what interests you, and use it to rename the files. I hope it helps.

ADD REPLYlink written 2.5 years ago by A. Domingues1.4k

Hello there!

I am trying to extract the following SRA accession numbers with Bioconductor v3.1:

"SRP041432","ERP010058","SRP032486","SRP048789","SRP016517","ERP010240","SRP042345","SRP050383","SRP039499","SRP024388","SRP039009","SRP040131","SRP010723","ERP010570","SRP045342","ERP002340","ERP003677","SRP040950".

 

However, by running

getSRAfile(in_acc = c("SRP041432","ERP010058","SRP032486","SRP048789","SRP016517","ERP010240","SRP042345","SRP050383","SRP039499","SRP024388","SRP039009","SRP040131","SRP010723","ERP010570","SRP045342","ERP002340","ERP003677","SRP040950"), sra_con = sra_con,

+            destDir = getwd(), fileType = 'sra', srcType='ftp')

 

I get error messages due to specific files, which I later confirm are available for download in SRAdownload, for example…

The error message:

trying URL 'ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/ERX/ERX219/ERX219608/ERR245074/ERR245074.sra'

Error in download.file(i, destfile = file.path(destDir, basename(i)),  :

  cannot open URL 'ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/ERX/ERX219/ERX219608/ERR245074/ERR245074.sra'

 

Am I doing anything wrong?

ADD REPLYlink written 23 months ago by massacomgrao0

srafile = getSRAdbFile() trying URL 'http://dl.dropbox.com/u/51653511/SRAmetadb.sqlite.gz' Error in download.file(url_sra, destfile = localfile, mode = "wb", method = method) : cannot open URL 'http://dl.dropbox.com/u/51653511/SRAmetadb.sqlite.gz'

ADD REPLYlink written 8 months ago by kevinchjp10
0
gravatar for Ada
16 months ago by
Ada0
Ada0 wrote:

when I run the code on my computer,I have a problem below,what is wrong?

library(SRAdb)

srafile=getSRAdbFile()

trying URL 'http://gbnci.abcc.ncifcrf.gov/backup/SRAmetadb.sqlite.gz' Content type 'application/x-gzip' length 1308358823 bytes (1247.7 Mb) opened URL downloaded 1247.7 Mb

Unzipping...

Error in .local(drv, ...) : Could not connect to database: unable to open database file

ADD COMMENTlink modified 16 months ago • written 16 months ago by Ada0

Perhaps you ran out of space in /tmp or the equivalent. Anyway, please post things like this as new questions.

ADD REPLYlink modified 16 months ago • written 16 months ago by Devon Ryan70k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 587 users visited in the last hour