different ways of downloading SRA metadata
1
0
Entering edit mode
24 months ago
Mathias ▴ 90

Hi all

I'm a little confused about where all data is stored and how to retrieve the different pieces for a particular GEO study (GSE113957). I've already retrieved the fastq files using sratools, and I'm looking at retrieving sample metadata now. I've also taken a look on biostars already, but there seem to be a couple of methods that get suggested.

But I'd like to do it programmatically, or at least be able to download it on our server. So then there's several more options:

  • Use the Run info CGI
  • E-utilities URL call
  • E-utilities command line (Entrez Direct?)

I haven't tried the E-utilities yet, since I've got a metadata file using the Run info CGI:

wget -O ./SRP144355_info.csv 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term= SRP144355'

But the file I've retrieved this way contains more, and different fields than the one retrieved from the run selector.
Could someone point out what the difference is, or if there is a preferred method?

SRA GEO • 1.8k views
ADD COMMENT
0
Entering edit mode

Hi When I run the command you mentioned above (command below), there is no content in the file.

wget -O ./SRP144355_info.csv 'http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=SRP144355'

And when I run the command on the web page, it shows "HTTP ERROR 400", do you know how to solve it?

ADD REPLY
2
Entering edit mode
24 months ago
GenoMax 141k

You should be able to get information from SRA using Entrezdirect (there are 143 samples showing two examples):

$ esearch -db sra -query PRJNA454681 | efetch -format runinfo 
Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR7093892,2018-11-17 11:42:03,2018-05-02 14:26:33,22292412,1671930900,0,75,565,,https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR7093892/SRR7093892,SRX4022539,,RNA-Seq,cDNA,TRANSCRIPTOMIC,SINGLE,0,0,ILLUMINA,NextSeq 500,SRP144355,PRJNA454681,3,454681,SRS3243030,SAMN09011827,simple,9606,Homo sapiens,GSM3124643,,,,,,,no,,,,,GEO,SRA698774,,public,A04A18FF048292A7C08F44610FF9644F,9D194CE3DBD0D7663327F15C40DA1110
SRR7093893,2018-11-17 11:42:03,2018-05-02 14:23:32,11074462,830584650,0,75,281,,https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR7093893/SRR7093893,SRX4022540,,RNA-Seq,cDNA,TRANSCRIPTOMIC,SINGLE,0,0,ILLUMINA,NextSeq 500,SRP144355,PRJNA454681,3,454681,SRS3243029,SAMN09011826,simple,9606,Homo sapiens,GSM3124644,,,,,,,no,,,,,GEO,SRA698774,,public,2D1372BD93EBE81264A845C294738123,1A74D57F233EB2B791D317ADED0C404F
ADD COMMENT
0
Entering edit mode

Hi! I occasionally encounter this problem when I run entrez direct, have you encountered it before and how did you solve it? Thanks!

  curl: (52) Empty reply from server
     ERROR:  curl command failed ( Sun Apr 23 14:33:00 CST 2023 ) with: 52
    -X POST https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi -d query_key=1&WebEnv=MCID_6444d111ab8c5a52f11e167f&retstart=200&retmax=100&db=sra&rettype=runinfo&retmode=text&tool=edirect&edirect=19.3&edirect_os=Linux&email=test%40localhost
     WARNING:  FAILURE ( Sun Apr 23 14:32:59 CST 2023 )
    nquire -url https://eutils.ncbi.nlm.nih.gov/entrez/eutils/ efetch.fcgi -query_key 1 -WebEnv MCID_6444d111ab8c5a52f11e167f -retstart 200 -retmax 100 -db sra -rettype runinfo -retmode text -tool edirect -edirect 19.3 -edirect_os Linux -email test@localhost
    EMPTY RESULT
    SECOND ATTEMPT
ADD REPLY

Login before adding your answer.

Traffic: 2102 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6