I am planning to use Bioconductor GEOquery to download a couple of micro-array datasets from NCBI GEO.
http://www.bioconductor.org/packages/release/bioc/html/GEOquery.html
Then I would like to export a subset of the metadata and the expression data to flat files that I can import elsewhere.
What I have so far is:
library(GEOquery) library("R.utils") geo_id <- "GSE45016" gse <- getGEO(geo_id,GSEMatrix=FALSE) #show metadata Meta(gse) #show metadata for first sample GSMList(gse)[[1]] #select specific field from metadata of first sample GSMList(gse)[[1]]@header$characteristics_ch1 # Result for sample 1 [1] "tissue: normal prostate (NP) epithelial cells" GSMList(gse)[[2]]@header$characteristics_ch1 # Result for sample 2 [1] "tissue: prostate cancer cells" "clinical stage: clinical T4N0M1" [3] "gleason score: GS 9" "psa level: PSA 5477ng/ml"
As you can see the number of key value pairs is different for sample 1 and 2. What is would like to have is an array for every key under
@header$characteristics_ch1
and then the value or null (in case the key is missing) for every sample in the GEO dataset" ;
key_tissue: normal prostate (NP) epithelial cells\tprostate cancer cells key_psa_level: null\tPSA 5477ng/ml
Other metadata fields like "title" luckily only have a single value beneath it.
GSMList(gse)[[1]]@header$title = "Normal prostate" GSMList(gse)[[2]]@header$title = "High-grade PC1"
Also these I would like to have in an array for the key title.
My second question is how to export the expressions data that is stored under every sample. I would like to stream trough all the probes, get the expression values for that probe for each sample and write it to another csv file.