Use esearch/efetch to output relationship table of GSM to SRR (bascially SRA file names)
1
0
Entering edit mode
9.9 years ago
predeus ★ 1.9k

Hello all,

I want to use NCBI command line utilities (esearch, efetch, etc) to achieve the following: given the GSE ID, I want a simple tab-separated output, giving you GSM experiment in column 1, and SRR file name in column 2. Ideally, it would also be collapsed by GSM ID (in cases when you have more than 1 sra file per GSM).

I'm reading through numerous pages of manuals that are relatively obscure for people without much experience with databases, so if you can help me to figure this out, I'd be most grateful :)

Thank you in advance!

GEO GEO-omnibus NCBI • 4.6k views
ADD COMMENT
1
Entering edit mode
9.9 years ago
predeus ★ 1.9k

Ok I've figured it out. To get GSM to SRR relationship, you can use this:

esearch -db sra -query "GSM123456" | efetch -format docsum | xtract -pattern DocumentSummary -element Runs | perl -ne '@mt = ($_ =~ /SRR\d+/g); print "@mt\n"'

note that it will also print multiple SRR IDs in one line (if you have more than one sra file per GSM).

ADD COMMENT
0
Entering edit mode

Can you give more information on

  1. if we should have a perl script to run the command?
  2. if it can be run in cygwin
  3. a url to the perl script if one exists.

Thank you.

ADD REPLY
0
Entering edit mode

Hello,

here's the description for the scripts I mentioned: http://www.ncbi.nlm.nih.gov/books/NBK179288/

They are called "Entrez direct" I believe.

Hope this helps,

-- Alex

ADD REPLY

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6