Get a complete GSM-to-SRX/SRR table
2
8
Entering edit mode
4.7 years ago
predeus ★ 1.7k

Hello all,

When working with publicly available data one often has to first find them in GEO, and then download the raw reads from the SRA archive database. I've been using the Entrez Direct tools from NCBI for this.

For example, using Entrez Direct tools from NCBI, I can put in the following query:

esearch -db sra -query GSM1467783 | efetch -format runinfo


And get the following output:

Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash


Now, I'm looking for a way to do it offline - to generate a big reference of GSM-to-.sra file correspondence, so by specifying the GSM ID or several, I can download all the relevant .sra file from the ftp.

I've found some R package wrappers around SQL databases; however, I'm puzzled that no table in SRAdb (I believe I've checked them all) includes GSM ids. I also haven't found the SRX/SRR ids in the GEOmetadb tables, although I might have missed something there.

At any rate, is there any database that establishes the correspondence between GSM and SRX/SRR identifiers?

If anybody can help me out with this, I'd be very grateful.

GEO SRA GSM GSE SRAdb • 7.0k views
0
Entering edit mode

Hello,

The file SRA_Accessions.tab helped me to extract the SRR to GSM mapping but what about the DRR and ERR codes? According to what I checked, it does not contain ERR accession codes. I found some DRR codes but regardless of particular GSM code.

Is there another file that contains DRR or ERR codes?

1
Entering edit mode

Please open a new question after using the search function towards your issue rather than refreshing old threads :)

0
Entering edit mode

Thanks for the info about how to get it from Entrez Direct!

@Helen, might this help you, if you don't need it to be offline?

7
Entering edit mode
4.7 years ago
Gregor Sturm ▴ 80

Have a look at the SRA ftp server. They provide a file called SRA_Accessions.tab which links various identifiers to each other.

With

grep ^SRR SRA_Accessions.tab | grep GSM


you can extract the SRR to GSM mapping.

0
Entering edit mode

This is exactly what I've been looking for. Thank you very much!

0
Entering edit mode

This really helped me. Thank you.

SRA_Accessions.tab is very useful for getting SRR codes, but it doesn't seem to contain any ERR codes, and the DRR codes it does contain don't seem to have proper GEO codes listed. Hence if I have a list of GEO accession codes and I write a script to get their corresponding SRA codes, I would only find the ones that correspond to SRR codes and not ERR or DRR.

Is there an offline solution that would work for all *RR accession codes?

0
Entering edit mode
3.8 years ago
predeus ★ 1.7k

Note that the table seems to be regularly updated - the most up-to-date file is now twice the size it was 11 months ago.