Question

How to find SRA sequences of some fungal whole genome sequences if only Biosample ID is available?

0

Entering edit mode

5 months ago

Harshita • 0

I am using some fungal isolates' whole genome sequences to reconstruct phylogeny with my isolates, but I am not able to find some SRA IDs for the isolates already used and published in other works. Is there any way I can know if the name of the submission is probably different from what I see on NCBI when I type, or if there is a way to find the SRA IDs of those sequences?

Refer to some IDs used in this publication: https://journals.asm.org/doi/10.1128/mbio.01219-17

SRA whole-genome-sequencing NCBI • 585 views

ADD COMMENT • link updated 5 months ago by Ram 44k • written 5 months ago by Harshita • 0

score 1 · Answer 1 · 2024-05-20

1

Entering edit mode

5 months ago

GenoMax 147k

Looks like the samples are from this project: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA320483&o=acc_s%3Aa

This table has the SRA# you need.

OR

Using EntrezDirect (only one record shown here, there are 45 total) :

$ esearch -db bioproject -query PRJNA320483 | elink -target sra | efetch -format runinfo
Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR14705994,2022-06-30 00:21:00,2021-06-02 01:53:15,23917379,7175213700,23917379,300,2603,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos4/sra-pub-zq-1/SRR014/14705/SRR14705994/SRR14705994.lite.1,SRX11043948,CpJA159_Lib,WGS,RANDOM,GENOMIC,PAIRED,0,0,ILLUMINA,Illumina HiSeq 2000,SRP322222,PRJNA320483,,320483,SRS9112198,SAMN19488802,simple,318829,Pyricularia oryzae,CpJA159,,,,,,,no,,,,,UNIVERSITY OF KENTUCKY,SRA1239061,,public,701E33E57A319359F5465FD8283FA279,D7F5EC84290C2D1B8D4259ED5677314F

ADD COMMENT • link 5 months ago by GenoMax 147k

0

Entering edit mode

Thank you so much for helping me out in this!

I have used most of the sequences that are found in this table. But, some samples like these SAMN08009548, do not have SRA data. What can be done in that case?

ADD REPLY • link 5 months ago by Harshita • 0

0

Entering edit mode

For those samples it appears that raw data was not submitted. Just an assembly.

$ esearch -db biosample -query SAMN08009548 | elink -target assembly | efetch -format docsum
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE DocumentSummarySet>
<DocumentSummarySet status="OK">
  <DbBuild>Build240520-1130.1</DbBuild>
  <DocumentSummary>
    <Id>1559091</Id>
    <GbUid>6007438</GbUid>
    <AssemblyAccession>GCA_002925295.1</AssemblyAccession>
    <LastMajorReleaseAccession>GCA_002925295.1</LastMajorReleaseAccession>
    <ChainId>2925295</ChainId>
    <AssemblyName>ASM292529v1</AssemblyName>
    <Taxid>318829</Taxid>
    <Organism>Pyricularia oryzae (rice blast fungus)</Organism>
    <SpeciesTaxid>318829</SpeciesTaxid>
    <SpeciesName>Pyricularia oryzae</SpeciesName>
    <AssemblyType>haploid</AssemblyType>
    <AssemblyStatus>Scaffold</AssemblyStatus>
    <AssemblyStatusSort>6</AssemblyStatusSort>

ADD REPLY • link 5 months ago by GenoMax 147k

0

Entering edit mode

It seems that the authors in this paper have used paired-end sequences only for their analysis, and for that, they would also have used the SRA IDs. I think that these have a different name in the database, and that's why I cannot find them.

ADD REPLY • link 5 months ago by Harshita • 0