Question: Programmatic access to Sample Information from GEO via GSM or SRR number
gravatar for seidel
18 months ago by
United States
seidel7.0k wrote:

How does one programmatically get access to all sample information for a given sample using either the GSM number or the SRR number? I've used esearch to get runinfo, and map from GSE series identifiers to GSM sample ids and PRJN SRA identifiers to get corresponding SRR numbers, etc. But for a given ChIP Seq experiment the antibody used for a given sample is nowhere among any of that information and only appears as a sample attribute on a web page:

GEO WEB page

How does one get programmatic access to any arbitrary sample attribute given a GSM or SRR id?

geo • 819 views
ADD COMMENTlink modified 12 months ago by Biostar ♦♦ 20 • written 18 months ago by seidel7.0k
gravatar for vkkodali
18 months ago by
United States
vkkodali2.0k wrote:

You can query Biosample using GSM accessions and parse the Biosample docsum to extract this information as follows:

esearch -db biosample -q 'GSM3143747' \
  | esummary \
  | xtract -pattern DocumentSummary -element Identifiers,Attribute

BioSample: SAMN09214109; SRA: SRS3306622; GEO: GSM3143747       Yeast Cell      seb1-1 epe1delta tfs1DN 30 C    Anti-H3K9me2 (abcam ab1220)

On the other hand, if all you have are SRR accessions you can get to Biosample using elink as follows:

esearch -db sra -q 'SRR7172016' \
  | elink -db sra -target biosample -name sra_biosample \
  | xtract -pattern DocumentSummary -element Identifiers,Attribute
ADD COMMENTlink modified 18 months ago • written 18 months ago by vkkodali2.0k

Excellent! This record has 80 samples so I can just write a loop and parse the results. There are some arguments in your example I didn't know about. Thanks!

ADD REPLYlink written 18 months ago by seidel7.0k

If you are planning to write a loop, be sure to check out the sections on While Loop and For Loop here:

ADD REPLYlink written 18 months ago by vkkodali2.0k
gravatar for genomax
18 months ago by
United States
genomax85k wrote:

Using EntrezDirect you can pull up information like this.

$ esearch -db sra -query "GSM3143747" | esummary | xtract -pattern DocumentSummary -element LIBRARY_STRATEGY,LIBRARY_CONSTRUCTION_PROTOCOL
ChIP-Seq    ChIP DNA was extracted by bead beater with 0.5mm zirconia beads.  ChIP DNa was isolated by antibodies directed to our protein of interest (H3K9me2 or rpb3x-FLAG).  DNA was then isolated by incubation by SDS/proteinase K followed by column purification (Macherey-Nagel Nucleospin Gel Cleanup Columns). Libraries were prepared using End-It DNA End Repair Kit (Epicenter) followed by A-tailing using Klenow fragment (NEB), ligation to Illumina adaptors using Rapid T4 DNA Ligase (Enzymatics), PCR amplification for 15 cycles, and size selection between 200-350bp by bead purification and gel extraction.

@vkkodali may be by later to provide a more refined answer.

ADD COMMENTlink written 18 months ago by genomax85k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1580 users visited in the last hour