Question: Programmatic access to Sample Information from GEO via GSM or SRR number
gravatar for seidel
3 months ago by
United States
seidel6.8k wrote:

How does one programmatically get access to all sample information for a given sample using either the GSM number or the SRR number? I've used esearch to get runinfo, and map from GSE series identifiers to GSM sample ids and PRJN SRA identifiers to get corresponding SRR numbers, etc. But for a given ChIP Seq experiment the antibody used for a given sample is nowhere among any of that information and only appears as a sample attribute on a web page:

GEO WEB page

How does one get programmatic access to any arbitrary sample attribute given a GSM or SRR id?

geo • 268 views
ADD COMMENTlink modified 3 months ago by vkkodali1.1k • written 3 months ago by seidel6.8k
gravatar for vkkodali
3 months ago by
United States
vkkodali1.1k wrote:

You can query Biosample using GSM accessions and parse the Biosample docsum to extract this information as follows:

esearch -db biosample -q 'GSM3143747' \
  | esummary \
  | xtract -pattern DocumentSummary -element Identifiers,Attribute

BioSample: SAMN09214109; SRA: SRS3306622; GEO: GSM3143747       Yeast Cell      seb1-1 epe1delta tfs1DN 30 C    Anti-H3K9me2 (abcam ab1220)

On the other hand, if all you have are SRR accessions you can get to Biosample using elink as follows:

esearch -db sra -q 'SRR7172016' \
  | elink -db sra -target biosample -name sra_biosample \
  | xtract -pattern DocumentSummary -element Identifiers,Attribute
ADD COMMENTlink modified 3 months ago • written 3 months ago by vkkodali1.1k

Excellent! This record has 80 samples so I can just write a loop and parse the results. There are some arguments in your example I didn't know about. Thanks!

ADD REPLYlink written 3 months ago by seidel6.8k

If you are planning to write a loop, be sure to check out the sections on While Loop and For Loop here:

ADD REPLYlink written 3 months ago by vkkodali1.1k
gravatar for genomax
3 months ago by
United States
genomax65k wrote:

Using EntrezDirect you can pull up information like this.

$ esearch -db sra -query "GSM3143747" | esummary | xtract -pattern DocumentSummary -element LIBRARY_STRATEGY,LIBRARY_CONSTRUCTION_PROTOCOL
ChIP-Seq    ChIP DNA was extracted by bead beater with 0.5mm zirconia beads.  ChIP DNa was isolated by antibodies directed to our protein of interest (H3K9me2 or rpb3x-FLAG).  DNA was then isolated by incubation by SDS/proteinase K followed by column purification (Macherey-Nagel Nucleospin Gel Cleanup Columns). Libraries were prepared using End-It DNA End Repair Kit (Epicenter) followed by A-tailing using Klenow fragment (NEB), ligation to Illumina adaptors using Rapid T4 DNA Ligase (Enzymatics), PCR amplification for 15 cycles, and size selection between 200-350bp by bead purification and gel extraction.

@vkkodali may be by later to provide a more refined answer.

ADD COMMENTlink written 3 months ago by genomax65k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1496 users visited in the last hour