Geo Profiles: Downloading Profile Data.
2
5
Entering edit mode
13.2 years ago

Hi all,

I'm looking for a programmatic way to download the data from NCBI GeoProfiles. For example, I would like to download the data for IL2 and the GSE1133 set.

Using entrez, the following query:

http://www.ncbi.nlm.nih.gov/geoprofiles?term=IL2[SYMB]+%22Homo+sapiens%22[ORGN]%20GSE1133[ACCN]

returns a HTML page with a button 'Download profile data'.

GDS596
ID_REF    GSM18927    GSM18928    GSM18915    GSM18916    GSM18939    GSM18940    GSM18933    GSM18934    GSM18925    GSM18926    GSM18931    GSM18932    GSM19019    GSM19020    GSM18923    GSM18924    GSM18941    GSM18942    GSM18929    GSM18930    GSM18911    GSM18912    GSM18935    GSM18936    GSM19005    GSM19006    GSM18921    GSM18922    GSM18919    GSM18920    GSM18917    GSM18918    GSM18913    GSM18914    GSM18937    GSM18938    GSM18943    GSM18944    GSM19003    GSM19004    GSM19011    GSM19012    GSM19009    GSM19010    GSM18945    GSM18946    GSM18963    GSM18964    GSM18905    GSM18906    GSM18965    GSM18966    GSM18873    GSM18874    GSM18973    GSM18974    GSM18977    GSM18978    GSM18979    GSM18980    GSM18883    GSM18884    GSM18885    GSM18886    GSM18907    GSM18908    GSM18909    GSM18910    GSM18867    GSM18868    GSM18947    GSM18948    GSM18995    GSM18996    GSM18975    GSM18976    GSM18997    GSM18998    GSM18967    GSM18968    GSM18959    GSM18960    GSM19015    GSM19016    GSM18957    GSM18958    GSM18981    GSM18982    GSM18989    GSM18990    GSM18985    GSM18986    GSM18987    GSM18988    GSM18983    GSM18984    GSM18951    GSM18952    GSM19007    GSM19008    GSM18999    GSM19000    GSM18889    GSM18890    GSM18881    GSM18882    GSM18877    GSM18878    GSM18875    GSM18876    GSM18879    GSM18880    GSM18871    GSM18872    GSM18903    GSM18904    GSM18949    GSM18950    GSM18953    GSM18954    GSM19013    GSM19014    GSM18971    GSM18972    GSM18969    GSM18970    GSM18869    GSM18870    GSM19017    GSM19018    GSM18991    GSM18992    GSM19021    GSM19022    GSM19001    GSM19002    GSM18899    GSM18900    GSM18961    GSM18962    GSM18901    GSM18902    GSM18993    GSM18994    GSM18865    GSM18866    GSM18897    GSM18898    GSM18887    GSM18888    GSM18893    GSM18894    GSM18895    GSM18896    GSM18891    GSM18892    GSM18955    GSM18956    Gene title    Gene symbol    Gene ID    UniGene title    UniGene symbol    UniGene ID    Nucleotide Title    GI    GenBank Accession    Platform_CLONEID    Platform_ORF    Platform_SPOTID    Chromosome location    Chromosome annotation    GO:Function    GO:Process    GO:Component    GO:Function    GO:Process    GO:Component    
tissue    amygdala    amygdala    cerebellum peduncles    cerebellum peduncles    cingulate cortex    cingulate cortex    hypothalamus    hypothalamus    medulla oblongata    medulla oblongata    occipital lobe    occipital lobe    olfactory bulb    olfactory bulb    parietal lobe    parietal lobe    pons    pons    prefrontal cortex    prefrontal cortex    temporal lobe    temporal lobe    thalamus    thalamus    trigeminal ganglion    trigeminal ganglion    whole brain    whole brain    caudate nucleus    caudate nucleus    cerebellum    cerebellum    globus pallidus    globus pallidus    subthalamic nucleus    subthalamic nucleus    spinal cord    spinal cord    ciliary ganglion    ciliary ganglion    superior cervical ganglion    superior cervical ganglion    dorsal root ganglion    dorsal root ganglion    fetal brain    fetal brain    fetal thyroid    fetal thyroid    fetal liver    fetal liver    fetal lung    fetal lung    PB-BDCA4+ dendritic cells    PB-BDCA4+ dendritic cells    bronchial epithelial cells    bronchial epithelial cells    pancreas    pancreas    pancreatic islets    pancreatic islets    BM-CD105+ endothelial    BM-CD105+ endothelial    BM-CD34+    BM-CD34+    BM-CD71+early erythroid    BM-CD71+early erythroid    bone marrow    bone marrow    whole blood    whole blood    adrenal gland    adrenal gland    adrenal cortex    adrenal cortex    adipocyte    adipocyte    ovary    ovary    placenta    placenta    uterus    uterus    uterus corpus    uterus corpus    prostate    prostate    testis    testis    testis seminiferous tubule    testis seminiferous tubule    testis germ cell    testis germ cell    testis interstitial    testis interstitial    testis leydig cell    testis leydig cell    heart    heart    atrioventricular node    atrioventricular node    appendix    appendix    721-B-lymphoblasts    721-B-lymphoblasts    PB-CD19+B cells    PB-CD19+B cells    PB-CD4+T cells    PB-CD4+T cells    PB-CD56+NK cells    PB-CD56+NK cells    PB-CD8+T cells    PB-CD8+T cells    PB-CD14+monocytes    PB-CD14+monocytes    lymph node    lymph node    lung    lung    liver    liver    skeletal muscle    skeletal muscle    smooth muscle    smooth muscle    cardiac myocytes    cardiac myocytes    BM-CD33+myeloid    BM-CD33+myeloid    tongue    tongue    salivary gland    salivary gland    pituitary    pituitary    skin    skin    thymus    thymus    thyroid    thyroid    tonsil    tonsil    trachea    trachea    colorectal adenocarcinoma    colorectal adenocarcinoma    leukemia chronic myelogenous K562    leukemia chronic myelogenous K562    leukemia lymphoblastic molt4    leukemia lymphoblastic molt4    leukemia promyelocytic hl60    leukemia promyelocytic hl60    lymphoma burkitts daudi    lymphoma burkitts daudi    lymphoma burkitts raji    lymphoma burkitts raji    kidney    kidney
207849_at    35.9    47.1    182    53.7    15.6    16.1    52.4    33.2    111.1    16.3    9.6    91    54.1    33.7    206.5    28.5    103.5    70.2    6.5    14.7    123.1    11.4    109.2    15.1    21.3    199.8    2.6    24.8    85.3    112.8    102.9    47.3    137.3    202.1    172.2    134.4    125.2    81.5    205    434.9    186.8    500.4    102.8    284.5    42.9    49.2    41.7    53.5    95.6    21.9    15.1    25.8    1.9    8.4    10.6    1.6    8.4    50.9    21.2    19.3    10.7    23    27.8    0.5    21.1    8.3    74.7    23.4    29.1    2.9    51.8    53.9    50.5    102.8    33.9    15.8    48.2    7.3    6.5    41.5    31.8    46    95.5    89.8    70.6    14.7    20    24.6    16.9    62.9    71.4    100.6    32.5    10.7    68.5    119.5    169.9    32.5    55.5    395.4    179.7    178.1    12    22.2    1.9    12.9    3.4    28.6    17.2    7.6    14.5    10.1    15    10.3    34.4    33    16.7    6.2    130.2    6.5    415.6    69.9    24.7    19.1    3.6    93.4    40.8    9.2    139.7    74.1    55.2    24.6    44.9    10    100.1    72.2    9.2    23.2    12    24.1    36    15.7    47.5    33.9    42    29.2    39    60.3    9.8    28.9    30.3    13.7    21.2    2.2    8    29.9    209.3    8.3    interleukin 2    IL2    3558                Homo sapiens interleukin 2 (IL2), mRNA    125661059    NM_000586                4q26-q27    Chromosome 4, NC_000004.11 (123372625..123377650, complement)    carbohydrate binding///cytokine activity///glycosphingolipid binding///growth factor activity///interleukin-2 receptor binding///interleukin-2 receptor binding///kappa-type opioid receptor binding///kinase activator activity///protein binding    T cell differentiation///activation of protein kinase C activity by G-protein coupled receptor protein signaling pathway///anti-apoptosis///cell adhesion///cell-cell signaling///elevation of cytosolic calcium ion concentration///immune response///natural killer cell activation///negative regulation of B cell apoptosis///negative regulation of heart contraction///negative regulation of inflammatory response///negative regulation of lymphocyte proliferation///negative regulation of protein amino acid phosphorylation///positive regulation of B cell proliferation///positive regulation of activated T cell proliferation///positive regulation of cell growth///positive regulation of cell proliferation///positive regulation of immunoglobulin secretion///positive regulation of isotype switching to IgG isotypes///positive regulation of protein amino acid phosphorylation///positive regulation of regulatory T cell differentiation///positive regulation of transcription from RNA polymerase II promoter///positive regulation of tyrosine phosphorylation of Stat5 protein///regulation of T cell homeostatic proliferation    extracellular region///extracellular space    GO:0030246///GO:0005125///GO:0043208///GO:0008083///GO:0005134///GO:0005134///GO:0031851///GO:0019209///GO:0005515    GO:0030217///GO:0007205///GO:0006916///GO:0007155///GO:0007267///GO:0007204///GO:0006955///GO:0030101///GO:0002903///GO:0045822///GO:0050728///GO:0050672///GO:0001933///GO:0030890///GO:0042104///GO:0030307///GO:0008284///GO:0051024///GO:0048304///GO:0001934///GO:0045591///GO:0045944///GO:0042523///GO:0046013    GO:0005576///GO:0005615
217181_at    18.5    22.2    24.9    15.4    18.6    17.6    12.7    8.6    45.3    31.5    25.6    21.8    9.8    14.1    33.5    25.7    82.3    38.7    6.8    9.7    26.7    43.6    9.7    14.9    52.1    77.3    9.7    15    34.7    17.6    26.3    24.8    41.3    57.2    13.1    17.7    8    11.4    16.5    28.6    69.1    66.5    16.9    21.2    9.8    17.3    16.8    11.9    28.1    16.1    12    16.1    1.6    3.5    7.6    7.9    5.3    12.7    8.3    14.4    1.4    2.3    1.2    1.8    12.1    4.1    25.7    14.8    6.2    21.8    17.1    13.5    20.2    23.5    14.7    14.2    5.2    3.9    13.8    18.1    11.3    26.7    25.7    25.6    20.8    9    16.9    34.1    39.1    30.7    30.4    13.4    27.9    20.4    29.6    47.1    33.8    13.7    23.8    35.5    41.3    60.6    2.9    2.4    2    1    3    2.2    1.4    1.4    2.8    2.6    2.3    3.5    11    10.4    15.3    16.9    22.6    19.1    80.1    85.7    5.5    4.3    14.7    28.7    2.6    2.2    14.9    10.4    7.3    6.6    29.7    8.6    28.2    21.4    12.9    8.4    10.5    13.8    13.9    20.1    12.5    11.8    15.3    23.6    3    11    10.3    8.9    6.3    6.4    6.5    5.4    11.8    15.4    47.6    31.1    interleukin 2    IL2    3558                Human interleukin 2 gene, clone pATtacIL-2C/2TT, complete cds, clone pATtacIL-2C/2TT    186300    M22005                4q26-q27    Chromosome 4, NC_000004.11 (123372625..123377650, complement)    carbohydrate binding///cytokine activity///glycosphingolipid binding///growth factor activity///interleukin-2 receptor binding///interleukin-2 receptor binding///kappa-type opioid receptor binding///kinase activator activity///protein binding    T cell differentiation///activation of protein kinase C activity by G-protein coupled receptor protein signaling pathway///anti-apoptosis///cell adhesion///cell-cell signaling///elevation of cytosolic calcium ion concentration///immune response///natural killer cell activation///negative regulation of B cell apoptosis///negative regulation of heart contraction///negative regulation of inflammatory response///negative regulation of lymphocyte proliferation///negative regulation of protein amino acid phosphorylation///positive regulation of B cell proliferation///positive regulation of activated T cell proliferation///positive regulation of cell growth///positive regulation of cell proliferation///positive regulation of immunoglobulin secretion///positive regulation of isotype switching to IgG isotypes///positive regulation of protein amino acid phosphorylation///positive regulation of regulatory T cell differentiation///positive regulation of transcription from RNA polymerase II promoter///positive regulation of tyrosine phosphorylation of Stat5 protein///regulation of T cell homeostatic proliferation    extracellular region///extracellular space    GO:0030246///GO:0005125///GO:0043208///GO:0008083///GO:0005134///GO:0005134///GO:0031851///GO:0019209///GO:0005515    GO:0030217///GO:0007205///GO:0006916///GO:0007155///GO:0007267///GO:0007204///GO:0006955///GO:0030101///GO:0002903///GO:0045822///GO:0050728///GO:0050672///GO:0001933///GO:0030890///GO:0042104///GO:0030307///GO:0008284///GO:0051024///GO:0048304///GO:0001934///GO:0045591///GO:0045944///GO:0042523///GO:0046013    GO:0005576///GO:0005615

Is it possible to fetch those data using the NCBI E-Utilities ? The format of the query is unclear to me.

geo ncbi api • 5.8k views
ADD COMMENT
2
Entering edit mode

Very useful thanks ! I'm curious: how did you find this information ?

Use free plugins for FireFox, for example Firebug or HTTPFox. For example, select Firebug "Net" panel, then open detailed profile image, and click on "Display values" button. In Net panel you'll see details how the page gets the data. (A bit of web development experience is helpful.)

ADD REPLY
4
Entering edit mode
13.1 years ago
John Doe ▴ 40
  1. Using eSearch find the profiles of interest, and grab UIDs list from the output. For given example (term=IL2[SYMB]+%22Homo+sapiens%22[ORGN]%20GSE1133[ACCN]) list of UIDs is: 4687368 4696548
  2. Append comma-separated list of UIDs to this URL: http://www.ncbi.nlm.nih.gov/geo/gds/getDatum.cgi?uid=LIST_OF_UIDS

    Example: http://www.ncbi.nlm.nih.gov/geo/gds/getDatum.cgi?uid=4687368,4696548

ADD COMMENT
0
Entering edit mode

Very useful thanks ! I'm curious: how did you find this information ?

ADD REPLY
2
Entering edit mode
13.2 years ago
Neilfws 49k

The Esearch part is not too difficult. In Ruby, for example, it looks like this (from my IRB console):

>> require 'bio'
=> true
>> Bio::NCBI.default_email = "me@me.com"
=> "me@me.com"
>> ncbi   = Bio::NCBI::REST.new
=> #<Bio::NCBI::REST:0x7fd2977238c0>
>> search = ncbi.esearch("GSE1133[ACCN] AND Homo+sapiens[ORGN] AND IL2[SYMB]", {"db" =>    "geoprofiles"})
=> ["4696548", "4687368"]

I don't think Efetch can be used to download the profile data; only a text summary of each record. The download link on the web page looks like something dynamic using javascript. Another task for the robots :-)

By the way, here's a full list of terms for searching geoprofiles.

ADD COMMENT
0
Entering edit mode

Thanks Neil, but I'm mainly interested in the efetch part :-)

ADD REPLY
0
Entering edit mode

Thought so. So I could have just written "no" :-)

ADD REPLY

Login before adding your answer.

Traffic: 3131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6