is there any tool for SRA metadata analysis?
1
0
Entering edit mode
18 months ago
Buffo ★ 2.4k

For example, I would like to summarize SRA metadata for a specific specie, PubMed ids associated (or citations), year of publication, country, sequencing technology, etc. I know I can use sratools, but I'm not sure if I can get all the information I need just with it, maybe with a combination of E-utilities... Anyways, I was wondering if you know an easier/faster way to do this.

sratools eutilities sra • 1.3k views
ADD COMMENT
1
Entering edit mode

ffq - https://github.com/pachterlab/ffq - can get you some metadata associated with SRA or GEO identifiers.

ADD REPLY
0
Entering edit mode

It's a cool tool, but I think it is not what I'm looking for.

ADD REPLY
0
Entering edit mode

I figured; since you want to search for a specific species, etc. so I just left this as a comment. To get what you want, GenoMax's solution is a good one.

ADD REPLY
2
Entering edit mode
18 months ago
GenoMax 143k

You can get summaries via EntrezDirect:

% esearch -db sra -query "escherichia coli [ORGN]" | efetch -format runinfo | head -3
Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR22827022,2022-12-20 10:53:58,2022-12-20 10:36:13,1295942,386442244,1295942,298,210,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-1/SRR022/22827/SRR22827022/SRR22827022.lite.1,SRX18786462,Nextera DNA Flex,WGS,RANDOM,GENOMIC,PAIRED,0,0,ILLUMINA,Illumina MiSeq,SRP161673,PRJNA292663,3,292663,SRS16220280,SAMN32329316,simple,562,Escherichia coli,22CT11CB04-EC,,,,,,,no,,,,,PULSENET,SRA1562651,,public,8D3EB59682288C7F42C948FA50120BBD,03B3ADE4AF6A9CDE339FA1DC549EC2D0

There are over 3 million accessions for E. coli

 % esearch -db sra -query "escherichia coli [ORGN]"                                   
<ENTREZ_DIRECT>
  <Db>sra</Db>
  <QueryKey>1</QueryKey>
  <Count>**331880**</Count>
  <Step>1</Step>
</ENTREZ_DIRECT>

This is not going to give you PubMed ID's though. Those may need to be obtained separately

% esearch -db sra -query "SRR22827027" | elink -target pubmed | esummary | xtract -pattern DocumentSummary -element Id,FullJournalName,ELocationID
34015113    Journal of food protection  doi: 10.4315/JFP-21-005
32513803    Antimicrobial agents and chemotherapy   doi: 10.1128/AAC.00573-20
31866986    Frontiers in microbiology   doi: 10.3389/fmicb.2019.02826
ADD COMMENT
0
Entering edit mode

Yes, that's what I thought. Many thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1999 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6