Question

different ways of downloading SRA metadata

0

Entering edit mode

24 months ago

Mathias ▴ 90

Hi all

I'm a little confused about where all data is stored and how to retrieve the different pieces for a particular GEO study (GSE113957). I've already retrieved the fastq files using sratools, and I'm looking at retrieving sample metadata now. I've also taken a look on biostars already, but there seem to be a couple of methods that get suggested.

retrieve metadata through the run selector:

But I'd like to do it programmatically, or at least be able to download it on our server. So then there's several more options:

Use the Run info CGI
E-utilities URL call
E-utilities command line (Entrez Direct?)

I haven't tried the E-utilities yet, since I've got a metadata file using the Run info CGI:

wget -O ./SRP144355_info.csv 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term= SRP144355'

But the file I've retrieved this way contains more, and different fields than the one retrieved from the run selector.
Could someone point out what the difference is, or if there is a preferred method?

SRA GEO • 1.8k views

ADD COMMENT • link updated 12 months ago by GenoMax 141k • written 24 months ago by Mathias ▴ 90

0

Entering edit mode

Hi When I run the command you mentioned above (command below), there is no content in the file.

wget -O ./SRP144355_info.csv 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=SRP144355'

And when I run the command on the web page, it shows "HTTP ERROR 400", do you know how to solve it?

ADD REPLY • link 12 months ago by claracen2021 • 0

GenoMax · Answer 1 · 2022-05-11

You should be able to get information from SRA using Entrezdirect (there are 143 samples showing two examples):

$ esearch -db sra -query PRJNA454681 | efetch -format runinfo 
Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR7093892,2018-11-17 11:42:03,2018-05-02 14:26:33,22292412,1671930900,0,75,565,,https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR7093892/SRR7093892,SRX4022539,,RNA-Seq,cDNA,TRANSCRIPTOMIC,SINGLE,0,0,ILLUMINA,NextSeq 500,SRP144355,PRJNA454681,3,454681,SRS3243030,SAMN09011827,simple,9606,Homo sapiens,GSM3124643,,,,,,,no,,,,,GEO,SRA698774,,public,A04A18FF048292A7C08F44610FF9644F,9D194CE3DBD0D7663327F15C40DA1110
SRR7093893,2018-11-17 11:42:03,2018-05-02 14:23:32,11074462,830584650,0,75,281,,https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR7093893/SRR7093893,SRX4022540,,RNA-Seq,cDNA,TRANSCRIPTOMIC,SINGLE,0,0,ILLUMINA,NextSeq 500,SRP144355,PRJNA454681,3,454681,SRS3243029,SAMN09011826,simple,9606,Homo sapiens,GSM3124644,,,,,,,no,,,,,GEO,SRA698774,,public,2D1372BD93EBE81264A845C294738123,1A74D57F233EB2B791D317ADED0C404F