Question: Get Sra Ids From Geo
7
gravatar for tomislav.ilicic
5.6 years ago by
United Kingdom
tomislav.ilicic120 wrote:

Hi,

I am trying to get all sample ids (SRS ids) from a GEO ID. For example, I am trying to fetch all SRS ids belonging to GSE44183.

Is there any way to get these programatically? I was trying to get these by using the e-utilities from NCBI but I just couldn't make the right query.

Help would be very much appreciated. Best, Tomi

geo ncbi • 5.1k views
ADD COMMENTlink modified 11 weeks ago by j.aryaman2520 • written 5.6 years ago by tomislav.ilicic120

You need to clarify this question. First, the title refers to "SRA ids". However, the question then uses "SRS ids", twice. Which is it? I suspect SRA.

Second, you need to define and give an example of the "sample ids" you want to retrieve. For this type of GEO record, one could retrieve GEO sample IDs (starting with GSM), or SRA read IDs (starting with SRR), or even SRX IDs. So please, define clearly what you want to do.

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Neilfws48k

Hi, 

I apologize, I got a bit confused with the number of different ids in this case. 

I have around 40 GSE ids where I want to download all sequencing data belonging to a GSE id (e.g.GSE44183). To do this, I thought to use fastq-dump which needs SRA ids as input.  Hence, I am trying to fetch all SRA IDs belonging to a GSE. Maybe this is not the right approach, but I couln't think of any other solution to download all the data in an easier way. 

Hope this is clear now. 

 

ADD REPLYlink written 5.6 years ago by tomislav.ilicic120

So you want sequencing run accessions, i.e. SRR?

ADD REPLYlink written 5.6 years ago by Neilfws48k

Yes. From GSE ids. 

ADD REPLYlink written 5.6 years ago by tomislav.ilicic120
5
gravatar for Neilfws
5.6 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

Not sure that you can get from GSE to SRR in one step, but EUtils is definitely the way to go.

You can get from GSE to SRX using EDirect like this (using head to show the first 5 results):

esearch -db gds -query "GSE44183[ACCN] AND GSM[ETYP]" | efetch -format docsum | \

xtract -pattern ExtRelation -element RelationType,TargetObject | head -5

SRA    SRX300901
SRA    SRX300900
SRA    SRX300899
SRA    SRX300898
SRA    SRX300897

Then you could write the SRX to a file, parse and use in a new esearch query:

esearch -db sra -query "SRX300901[ACCN]" | efetch -format docsum | xtract  -element Runs

 <Run acc="SRR893074" total_spots="22020236" total_bases="3963642480" load_done="true" is_public="true" cluster_name="public" static_data_available="true"/> 

That does not quite get you there, since the SRR is contained in an attribute. You may want to use the XML parser of your choice, rather than EDirect xtract, to process the XML returned by efetch.

 

Another approach that I have not yet explored: it may be possible to parse a GEO SOFT or MINiML file, which should be obtainable from the FTP site using the original GSE accession.

 

ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Neilfws48k
3

Hi,

Thanks for the help.

I solved it by doing this: 

 esearch -db sra -query "GSE52529" | efetch -format docsum | xtract -pattern DocumentSummary -element Run@acc

ADD REPLYlink written 5.6 years ago by tomislav.ilicic120
2
gravatar for Kamil
4.3 years ago by
Kamil1.9k
Boston
Kamil1.9k wrote:

Thanks to Neil and Tomislav for the helpful comments! I use this script to download all SRA files for a given SRA id:

ADD COMMENTlink written 4.3 years ago by Kamil1.9k
1

Hi Kamil, can you make a little modification to catch the sample name at the same time? for example SRS, 'Sperm' ....

ADD REPLYlink written 16 months ago by Shicheng Guo7.8k
2
gravatar for j.aryaman25
11 weeks ago by
j.aryaman2520
j.aryaman2520 wrote:

This code will get all SRR identifiers from a GSE:

#!/usr/bin/env bash

# gse2srr.sh
# Requires entrez-direct
# conda install -c bioconda entrez-direct

# To use,
# bash gse2srr.sh GSE52529
# This will create a text file GSE52529_SRR.txt

GSE=$1
echo "Finding all SRX associated with ${GSE}..."

mapfile -t SRX_ARRAY < <(esearch -db gds -query "${GSE}[ACCN] AND GSM[ETYP]" |\
efetch -format docsum | xtract -pattern ExtRelation -element TargetObject)

echo "Finding all SRR associated with ${GSE}..."

rm -f ${GSE}_SRR.txt

for i in "${SRX_ARRAY[@]}"
do
   echo "$i"
   esearch -db sra -query $i | efetch -format docsum | \
   xtract -pattern DocumentSummary -element Run@acc >> ${GSE}_SRR.txt
done

It is a bit slow because it does a database query for every SRX. I would be stunned if there isn't a faster way to do this, but it at least answers the question.

ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by j.aryaman2520

Given you have the study accession, e.g. PRJNA288801 you can simply look it up at the ENA and then make a fast download as described in this tutorial:

Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLYlink written 11 weeks ago by ATpoint26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1838 users visited in the last hour