Get Sra Ids From Geo
3
7
Entering edit mode
8.7 years ago

Hi,

I am trying to get all sample ids (SRS ids) from a GEO ID. For example, I am trying to fetch all SRS ids belonging to GSE44183.

Is there any way to get these programatically? I was trying to get these by using the e-utilities from NCBI but I just couldn't make the right query.

Help would be very much appreciated. Best, Tomi

geo ncbi • 8.0k views
ADD COMMENT
0
Entering edit mode

You need to clarify this question. First, the title refers to "SRA ids". However, the question then uses "SRS ids", twice. Which is it? I suspect SRA.

Second, you need to define and give an example of the "sample ids" you want to retrieve. For this type of GEO record, one could retrieve GEO sample IDs (starting with GSM), or SRA read IDs (starting with SRR), or even SRX IDs. So please, define clearly what you want to do.

ADD REPLY
0
Entering edit mode

Hi,

I apologize, I got a bit confused with the number of different ids in this case.

I have around 40 GSE ids where I want to download all sequencing data belonging to a GSE id (e.g.GSE44183). To do this, I thought to use fastq-dump which needs SRA ids as input. Hence, I am trying to fetch all SRA IDs belonging to a GSE. Maybe this is not the right approach, but I couldn't think of any other solution to download all the data in an easier way.

Hope this is clear now.

ADD REPLY
0
Entering edit mode

So you want sequencing run accessions, i.e. SRR?

ADD REPLY
0
Entering edit mode

Yes. From GSE ids.

ADD REPLY
5
Entering edit mode
8.6 years ago
Neilfws 49k

Not sure that you can get from GSE to SRR in one step, but EUtils is definitely the way to go.

You can get from GSE to SRX using EDirect like this (using head to show the first 5 results):

esearch -db gds -query "GSE44183[ACCN] AND GSM[ETYP]" | efetch -format docsum | \
xtract -pattern ExtRelation -element RelationType,TargetObject | head -5

SRA    SRX300901
SRA    SRX300900
SRA    SRX300899
SRA    SRX300898
SRA    SRX300897

Then you could write the SRX to a file, parse and use in a new esearch query:

esearch -db sra -query "SRX300901[ACCN]" | efetch -format docsum | xtract -element Runs

<Run acc="SRR893074" total_spots="22020236" total_bases="3963642480" load_done="true" is_public="true" cluster_name="public" static_data_available="true"/>

That does not quite get you there, since the SRR is contained in an attribute. You may want to use the XML parser of your choice, rather than EDirect xtract, to process the XML returned by efetch.

Another approach that I have not yet explored: it may be possible to parse a GEO SOFT or MINiML file, which should be obtainable from the FTP site using the original GSE accession.

ADD COMMENT
3
Entering edit mode

Hi,

Thanks for the help.

I solved it by doing this:

esearch -db sra -query "GSE52529" | efetch -format docsum | xtract -pattern DocumentSummary -element Run@acc
ADD REPLY
2
Entering edit mode
7.3 years ago
Kamil ★ 2.1k

Thanks to Neil and Tomislav for the helpful comments! I use this script to download all SRA files for a given SRA id:

ADD COMMENT
1
Entering edit mode

Hi Kamil, can you make a little modification to catch the sample name at the same time? for example SRS, 'Sperm' ....

ADD REPLY
1
Entering edit mode
3.3 years ago
j.aryaman25 ▴ 10

This code will get all SRR identifiers from a GSE:

#!/usr/bin/env bash

# gse2srr.sh
# Requires entrez-direct
# conda install -c bioconda entrez-direct

# To use,
# bash gse2srr.sh GSE52529
# This will create a text file GSE52529_SRR.txt

GSE=$1
echo "Finding all SRX associated with ${GSE}..."

mapfile -t SRX_ARRAY < <(esearch -db gds -query "${GSE}[ACCN] AND GSM[ETYP]" |\
efetch -format docsum | xtract -pattern ExtRelation -element TargetObject)

echo "Finding all SRR associated with ${GSE}..."

rm -f ${GSE}_SRR.txt

for i in "${SRX_ARRAY[@]}"
do
   echo "$i"
   esearch -db sra -query $i | efetch -format docsum | \
   xtract -pattern DocumentSummary -element Run@acc >> ${GSE}_SRR.txt
done

It is a bit slow because it does a database query for every SRX. I would be stunned if there isn't a faster way to do this, but it at least answers the question.

ADD COMMENT
0
Entering edit mode

Given you have the study accession, e.g. PRJNA288801 you can simply look it up at the ENA and then make a fast download as described in this tutorial:

Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLY

Login before adding your answer.

Traffic: 2241 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6