Download SRA raw data using efetch
1
0
Entering edit mode
22 days ago
biofysikos • 0

Before I ask my question I would like to say that I am new to RNAseq data.

I have been downloading raw data from the NCBI databases using the efetch (i.e. Entrez). Is it possible to download raw sequence reads (from the SRA database) using the same method (also using Bio.Entrez in Biopython) or do I have to use sra toolkit (https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/) exclusively?

sra-toolkit efetch biopython • 589 views
ADD COMMENT
3
Entering edit mode

not directly via efetch but do have a look at https://sra-explorer.info/ , great tool to get the necessary cmdlines to download data(sets)

ADD REPLY
0
Entering edit mode

you can use sratool kit that works.

ADD REPLY
3
Entering edit mode
22 days ago
GenoMax 149k

You can use efetch to query for metadata from SRA but you can't use efetch to download sequence data.

for example you can do this:

$ efetch -db sra -id SRR23292069 -format runinfo
Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR23292069,2023-10-10 12:21:24,2023-10-08 03:56:36,2759641173,629198187444,0,228,205294,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos1/sra-pub-zq-38/SRR023/23292/SRR23292069/SRR23292069.lite.1,SRX19235309,GSM7016416,OTHER,other,GENOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP420328,PRJNA929977,3,929977,SRS16639721,SAMN32979125,simple,9606,Homo sapiens,GSM7016416,,,,,,,no,,,,,COLUMBIA UNIVERSITY,SRA1582859,,public,2763C5E41A367906FEFC2A1BF8B5175B,062E01F712571F514888E2A144184C19

There is an embedded URL in the metadata above (https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos1/sra-pub-zq-38/SRR023/23292/SRR23292069/SRR23292069.lite.1 ) that you could use to download some data. Looking at the name in the link this data is not completely original since it appears to be in SRA lite format (LINK).


You don't need to use sratoolkit exclusively since in most cases it is possible to download fastq data from ENA. lieven.sterck has linked a tool that can generate those links.

ADD COMMENT
0
Entering edit mode

Thank you GenoMax . So to recapitulate, I can just use curl to download the URL extracted by efetch.

When you say ENA, do you mean the European Nucleotide Archive? And if yes, how is lieven.sterck 's link related to ENA?

Thank you once again.

ADD REPLY
1
Entering edit mode

I can just use curl to download the URL extracted by efetch

As long as you are aware of the limitations (e.g. there may be a single file for paired-end data, 10x data etc) and that the data you download may not be usable as is.

do you mean the European Nucleotide Archive? And if yes, how is lieven.sterck 's link related to ENA?

Yes. The tool linked by lieven.sterck has a tutorial here that shows you how it generates links for fastq from ENA --> sra-explorer : find SRA and FastQ download URLs in a couple of clicks

Be sure to replace the ftp:// in the links with https:// since ENA no longer allows FTP connections via browser.

ADD REPLY
0
Entering edit mode

Thank you GenoMax! It all makes sense (for now).

ADD REPLY

Login before adding your answer.

Traffic: 2982 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6