Annotate list of GSMxxxx GEO sample IDs with matching source GSExxx dataset identifier?
2
0
Entering edit mode
4 months ago
lkobzik • 0

I have file of gene expression data produced by ARCHS4 which is a table of genes in rows and sample IDs (e.g., GSM1132717, GSM1132728, GSM1233280 etc.) There are ~8K columns since this was a download of all whole blood samples in ARCHS4.

I ask for advice/tools to batch process the list of sample IDs to find their source GSE dataset, e.g by manual search: GSM1132717 comes from GSE46579, GSM1233280 from GSE50957 etc....

Thanks

GSE archS4 GEO • 548 views
ADD COMMENT
1
Entering edit mode
4 months ago
GenoMax 142k

Using EntrezDirect:

$ esearch -db biosample -query  GSM1132717 | elink -target bioproject | efetch -format native -mode xml | xtract -pattern DocumentSummary -element ID
GSE46579

$ esearch -db biosample -query  GSM1233280 | elink -target bioproject | efetch -format native -mode xml | xtract -pattern DocumentSummary -element ID
GSE50957
ADD COMMENT
0
Entering edit mode
4 months ago
Ankit ▴ 500

I tried to do similar things before with combination of esearch and efetch but can't recall the syntax. You can check this https://www.ncbi.nlm.nih.gov/geo/info/geo_paccess.html

Other possibility is using R Package: GEOquery

ADD COMMENT

Login before adding your answer.

Traffic: 1352 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6