Dear all,
I have an extremely long list of unannotated CEL files and I would like to fetch metadata from GEO in bulk. Any advice for that?
Thanks in advance!
Dear all,
I have an extremely long list of unannotated CEL files and I would like to fetch metadata from GEO in bulk. Any advice for that?
Thanks in advance!
You can use the Entrez utilities to get the information you need:
$ esearch -db gds -query GSM85508 | efetch
1. Basal-like breast cancer tumors
Analysis of sporadic basal-like cancer (BLC), BRCA-associated breast cancer, and non-BLC tumors. Sporadic BLC are phenotypically similar to BRCA1-associated cancers. Results provide insight into the molecular pathogenesis of BLC and BRCA1-associated breast cancer.
Organism: Homo sapiens
Type: Expression profiling by array, transformed count, 4 disease state sets
Platform: GPL570 Series: GSE3744 47 Samples
FTP download: GEO (CEL) ftp://ftp.ncbi.nlm.nih.gov/geo/datasets/GDS2nnn/GDS2250/
DataSet Accession: GDS2250 ID: 2250
2. Human breast tumor expression
(Submitter supplied) Gene expression for 47 human breast tumor cases; (* normalized by GCRMA for global expression analysis) Keywords: Type
Organism: Homo sapiens
Type: Expression profiling by array
Dataset: GDS2250 Platform: GPL570 47 Samples
FTP download: GEO (CEL) ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE3nnn/GSE3744/
Series Accession: GSE3744 ID: 200003744
3. [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array
(Submitter supplied) Affymetrix submissions are typically submitted to GEO using the GEOarchive method described at http://www.ncbi.nlm.nih.gov/projects/geo/info/geo_affy.html June 03, 2009: annotation table updated with netaffx build 28 June 06, 2012: annotation table updated with netaffx build 32 June 23, 2016: annotation table updated with netaffx build 35 Protocol: see manufacturer's web site Complete coverage of the Human Genome U133 Set plus 6,500 additional genes for analysis of over 47,000 transcripts All probe sets represented on the GeneChip Human Genome U133 Set are identically replicated on the GeneChip Human Genome U133 Plus 2.0 Array. more...
Organism: Homo sapiens
602 DataSets 4699 Series 58 Related Platforms 133119 Samples
FTP download: GEO ftp://ftp.ncbi.nlm.nih.gov/geo/platforms/GPLnnn/GPL570/
Platform Accession: GPL570 ID: 100000570
4. T92 U133p2
Organism: Homo sapiens
Source name: T92
Platform: GPL570 Series: GSE3744 Dataset: GDS2250
FTP download: GEO (CEL) ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM85nnn/GSM85508/
Sample Accession: GSM85508 ID: 300085508
To do a batch search you'll need to write a script to read each of your GSM accessions and send a query to NCBI with the above command.
I believe GEOmetadb could help you:
https://gbnci-abcc.ncifcrf.gov/geo/gsm.php R package: http://bioconductor.org/packages/2.2/bioc/html/GEOmetadb.html
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Solved: It's quite tricky
e.g. https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=sra&id=010170,..
First you extract the experiment ID associated to the GSM ID, then you can redo the search to extract the values for the specific term you are interested. If anyone has a faster solution it would be very welcome
Could you post the GSE ID. I'll try and get back to you.
Dont have the GSE, just have the GSM IDs
Could you please give me a couple of those?
Hey, let's work with GSM85508 for example.
Open https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM85508 and scroll down. You will find a label named "Series", beside which the GSE ID is mentioned. Click on the GSE ID, and at the bottom of the page, there will be a link to a compressed file. The compressed file would contain the .cel files
Thanks but im interested in the bulk metadata download. should be with this but its not working: