Question about Finding SRA IDs
1
1
Entering edit mode
3.2 years ago
nickarsimet ▴ 30

Hi everyone, I'm new to sequencing and trying to get my bearings around the open-source tools that are available.

Currently, I'm trying to figure out what the SRA ID is for this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7863352/. The link to the NCBI BioProject says that the ID is 517527, but when I try to use the SRA tools to print the sequences using fastq-dump --stdout SRR517527, I get ~5000 entries in the output. This doesn't make sense, because the BioProject says that there's only 318 data samples.

I'd really appreciate any advice on this issue, and especially an explanation of what exactly the difference is between SRR/ERR/DRR accessions and SRA (or a link to where I can read about the difference, I couldn't find a good guide on the NCBI website).

software error genome sequencing • 1.9k views
ADD COMMENT
0
Entering edit mode

SRR/ERR/DRR accessions

SRR - Data Submitted to NCBI
ERR - Data Submitted to EBI
DRR - Data submitted to DDJB in Japan

ADD REPLY
8
Entering edit mode
3.2 years ago
GenoMax 142k

There are 318 samples in this bioproject. You will have to fastq-dump each of those samples independently by using the SRA# for each sample. A condensed example below.

Using EntrezDirect.

$ esearch -db sra -query "PRJNA517527" | efetch -format runinfo
Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR8504995,2019-01-29 19:19:19,2019-01-29 19:03:17,1084763,219122126,1084763,202,146,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos3/sra-pub-run-20/SRR8504995/SRR8504995.1,SRX5309127,Clonal_chem1_D12,WGS,RANDOM,GENOMIC,PAIRED,0,0,ILLUMINA,Illumina HiSeq 2000,SRP182873,PRJNA517527,,517527,SRS4306633,SAMN10836167,simple,562,Escherichia coli,Clonal_chem1_D12,,,,,,,no,,,,,STANFORD UNIVERSITY,SRA841247,,public,BF2846BEC65751981D48F46E5F623E0D,62D5A59794F4EC842542712E94D062CE

You can get the SRA acceesions by

$ esearch -db sra -query "PRJNA517527" | efetch -format runinfo | grep -v "Run" | cut -d ',' -f1
SRR8504995
SRR8505002
SRR8504977
SRR8504976
SRR8505293
SRR8505291
SRR8504989
SRR8504998

Or direct download links (truncated for brevity):

$ esearch -db sra -query "PRJNA517527" | efetch -format runinfo | grep -v "Run" | cut -d ',' -f10
https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos3/sra-pub-run-20/SRR8504995/SRR8504995.1
https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos3/sra-pub-run-20/SRR8505002/SRR8505002.1
https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos2/sra-pub-run-15/SRR8504977/SRR8504977.1
https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos3/sra-pub-run-21/SRR8504976/SRR8504976.1
ADD COMMENT
0
Entering edit mode

Thank you so much! Exactly what I needed.

ADD REPLY

Login before adding your answer.

Traffic: 2804 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6