API way to find "Original submitter" files in SRA?
1
1
Entering edit mode
11 weeks ago
predeus ★ 1.9k

Hi all,

If you work with single cell, you probably experienced many times how GEO/SRA butchers the "technical" reads, basically converting single cell 10X experiments into a strange sort of bulk RNA-seq.

Sometimes however the reads are available as 10X BAMs, that are submitted by the users. For example, for this run, you have an option to download the BAM from Amazon without having to pay cloud fees:

enter image description here

What I was wondering if it's possible to retrieve these links _en masse_ using something like Entrez utils? Normally, I would run something like this:

esearch -db sra -query SRR18070428 | efetch -format runinfo

However, the output in this case is as follows:

Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR18070428,2023-07-01 15:41:47,2022-02-23 17:33:26,393550393,35419535370,0,90,5995,GCA_000001405.29,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos2/sra-pub-zq-20/SRR018/18070/SRR18070428/SRR18070428.lite.1,SRX14222131,,RNA-Seq,cDNA,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina NovaSeq 6000,SRP360500,PRJNA808248,3,808248,SRS12043620,SAMN26038351,simple,9606,Homo sapiens,GSM5906307,,,,,,,no,,,,,GEO,SRA1374480,,public,94B93456534C81E666680118B95804E0,AE8023C53DB00098682F3B51E2D46143

As you can see, no Amazon links there. There's another tool that can look up files - namely, srapath; however, running

srapath SRR18070428

produces an Amazon link to the same (useless) single-end SRA archive - not to the BAM file you can download manually.

If you have any ideas or knowledge as to how this could be automated, I would be most grateful.

All the best,

-- Alex

fastq SRA • 641 views
ADD COMMENT
0
Entering edit mode

This information is not available via Entrezutils. You may need to scrape the pages above.

ADD REPLY
0
Entering edit mode

You meant _not_ available, right? Not sure NCBI allows scraping, at least my lame efforts at it failed. Oh well.

ADD REPLY
0
Entering edit mode

Yes. Corrected above.

ADD REPLY
0
Entering edit mode

Apparently it's possible! API way to find "Original submitter" files in SRA?

NCBI is like Bash - you use it for 10 years and there's still a ton of caveats and options you have no idea about.

ADD REPLY
1
Entering edit mode
10 weeks ago
predeus ★ 1.9k

Thanks to the helpful SRA support team, I have learned something new!

Apparently, there is a tool named SDL - SRA Data Locator

Which is extremely useful in this particular case. Running something like

curl -s "https://locate.ncbi.nlm.nih.gov/sdl/2/retrieve?acc=SRR18070426&accept-alternate-locations=yes" > SRR18070426.json

generates a JSON formatted output with all the necessary URLs inside; all you need to do is just parse the output.

ADD COMMENT

Login before adding your answer.

Traffic: 1552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6