Question: SRA archive metadata
tim.ivanov.9210 wrote:

I'm using this command

esearch -db sra -query SRR1399843 | efetch -format runinfo

To obtain metadata about SRA archive.

it results in such information:

Run SRR1399843

ReleaseDate 2014-06-14 13:41:56

LoadDate 2014-10-04 03:42:33

spots 40704619

bases 6187102088

spots_with_mates 40704619

avgLength 152

size_MB 3177

AssemblyName GCF_000001405.25

download_path @dbgap@:reads/SRP012682/SRS637847/SRX599630/SRR1399843/SRR1399843.sra

Experiment SRX599630

LibraryName Solexa-227108

LibraryStrategy RNA-Seq

LibrarySelection cDNA


LibraryLayout PAIRED

InsertSize 150

InsertDev 311.773


Model Illumina HiSeq 2000

SRAStudy SRP012682

BioProject PRJNA75899


ProjectID 75899

Sample SRS637847

BioSample SAMN02791143

SampleType simple

TaxID 9606

ScientificName Homo sapiens

SampleName GTEX-13QIC-1626-SM-5K7TZ




Subject_ID 985098

Sex female


Tumor no


Analyte_Type RNA:Total RNA

Histological_Type Blood Vessel

Body_Site Artery - Tibial

CenterName BI

Submission SRA123108

dbgap_study_accession phs000424

Consent GRU

RunHash 478268EA67D40812258F63CDD4F1FE4A

ReadHash 4B32F0F08BF1C763FD72BCF414D77F76

How can modify my request, so that i could understand whether an archive has or has not been mapped? i.e. to understand whether there are mapped reads inside, or raw?

sra-toolkit sra ncbi
written 3 months ago by tim.ivanov.9210
vkkodali960 wrote:

Access to the run accession in your example appears to be controlled. However, you can search for SRA data with aligned reads by adding aligned_data[Properties] filter to your query like this:

esearch -db sra -query 'Homo sapiens[Organism] AND aligned_data[Properties]'
written 3 months ago by vkkodali960

Thank you for your reply!

I've actually trying to obtain metadata for already downloaded files (they are controlled, but i have a key)

Can you specify what is it that i see in the output of your request?

each line is like:

SRR7944888,2018-09-30 18:41:15,2018-09-30 18:26:12,11129735,1531891705,0,137,879,GCA_000001405.13,,SRX4779187,Z-138-REPLIg-E3-IonPlus,WGA,MDA,GENOMIC,SINGLE,0,0,ION_TORRENT,Ion Torrent PGM,SRP162960,PRJNA494024,,494024,SRS3859968,SAMN10147560,simple,9606,Homo sapiens,Z-138 (Mantle Cell Lymphoma) cell line,,,,,male,,no,,,,,UNIVERSITY OF VIGO,SRA787348,,public,C0C5E534A5AD060C2F8111B2208089E7,A7D604F965FB33846C8EB5810F31298E

does it mean, that all lines i see here have as first word (SRR7944888 in this example) an id of project which does indeed contain aligned reads inside?

written 3 months ago by tim.ivanov.9210

If the reads are aligned, then the efetch output XML has the term AlignInfo and some associated data. If all you want to know is whether the SRR accession you have comes with aligned reads or not, you can probably do something like this:

## this is your example; it has alignments
esearch -db sra -q 'SRR7944888' | efetch | grep -c 'AlignInfo'
## this example does not have aligned reads
esearch -db sra -q 'SRR299116' | efetch | grep -c 'AlignInfo'
written 3 months ago by vkkodali960
