Question

Programmatic access to Sample Information from GEO via GSM or SRR number

0

Entering edit mode

5.3 years ago

seidel 11k

How does one programmatically get access to all sample information for a given sample using either the GSM number or the SRR number? I've used esearch to get runinfo, and map from GSE series identifiers to GSM sample ids and PRJN SRA identifiers to get corresponding SRR numbers, etc. But for a given ChIP Seq experiment the antibody used for a given sample is nowhere among any of that information and only appears as a sample attribute on a web page:

GEO WEB page

How does one get programmatic access to any arbitrary sample attribute given a GSM or SRR id?

GEO • 2.4k views

ADD COMMENT • link updated 4.8 years ago by Biostar 20 • written 5.3 years ago by seidel 11k

0

Entering edit mode

5.3 years ago

GenoMax 141k

Using EntrezDirect you can pull up information like this.

$ esearch -db sra -query "GSM3143747" | esummary | xtract -pattern DocumentSummary -element LIBRARY_STRATEGY,LIBRARY_CONSTRUCTION_PROTOCOL
ChIP-Seq    ChIP DNA was extracted by bead beater with 0.5mm zirconia beads.  ChIP DNa was isolated by antibodies directed to our protein of interest (H3K9me2 or rpb3x-FLAG).  DNA was then isolated by incubation by SDS/proteinase K followed by column purification (Macherey-Nagel Nucleospin Gel Cleanup Columns). Libraries were prepared using End-It DNA End Repair Kit (Epicenter) followed by A-tailing using Klenow fragment (NEB), ligation to Illumina adaptors using Rapid T4 DNA Ligase (Enzymatics), PCR amplification for 15 cycles, and size selection between 200-350bp by bead purification and gel extraction.

@vkkodali may be by later to provide a more refined answer.

ADD COMMENT • link 5.3 years ago by GenoMax 141k

score 2 · Accepted Answer · 2019-01-10

2

Entering edit mode

5.3 years ago

vkkodali_ncbi ★ 3.7k

You can query Biosample using GSM accessions and parse the Biosample docsum to extract this information as follows:

esearch -db biosample -q 'GSM3143747' \
  | esummary \
  | xtract -pattern DocumentSummary -element Identifiers,Attribute

BioSample: SAMN09214109; SRA: SRS3306622; GEO: GSM3143747       Yeast Cell      seb1-1 epe1delta tfs1DN 30 C    Anti-H3K9me2 (abcam ab1220)

On the other hand, if all you have are SRR accessions you can get to Biosample using elink as follows:

esearch -db sra -q 'SRR7172016' \
  | elink -db sra -target biosample -name sra_biosample \
  | xtract -pattern DocumentSummary -element Identifiers,Attribute

ADD COMMENT • link 5.3 years ago by vkkodali_ncbi ★ 3.7k

0

Entering edit mode

Excellent! This record has 80 samples so I can just write a loop and parse the results. There are some arguments in your example I didn't know about. Thanks!

ADD REPLY • link 5.3 years ago by seidel 11k

1

Entering edit mode

If you are planning to write a loop, be sure to check out the sections on While Loop and For Loop here: https://www.ncbi.nlm.nih.gov/books/NBK179288/#chapter6.Automation

ADD REPLY • link 5.3 years ago by vkkodali_ncbi ★ 3.7k