Question: How to get data from GEO
0
gravatar for wenbinm
2.3 years ago by
wenbinm20
USA
wenbinm20 wrote:

Hi there,

I am using R package GEOquery to download data from GEO. I use

library(GEOquery)
library(Biobase)
data <- getGEO('GSE2034')
data <- as.data.frame(exprs(data[[1]])) #extracting expression data

Then I have a file named "GSE2034_family.soft.gz" downloaded. So far this works well. But the other time I tried directly reading "GSE2034_family.soft.gz":

library(GEOquery)
library(Biobase)
data <- getGEO(filename = 'GSE2034_family.soft.gz' )
data <- as.data.frame(exprs(data[[1]]))

Then I got

"Error in data[[1]] : this S4 class is not subsettable"

Does anyone know how to fix this?

Thank you!

microarray • 6.3k views
ADD COMMENTlink modified 2.3 years ago by Kevin Blighe68k • written 2.3 years ago by wenbinm20
6
gravatar for Kevin Blighe
2.3 years ago by
Kevin Blighe68k
Republic of Ireland
Kevin Blighe68k wrote:

Edit (1st September 2018): see a quick distinction of the GEO files, here: A: Parsing values from GSE file

----------------------------

With your first chunk of code, you are obtaining the 'series matrix' data, which, in the vast majority of cases, is already normalized and transformed by log (base 2). Your object data is stored in an ExpressionSet object, which is the standard way to store microarray data:

data <- getGEO('GSE2034', GSEMatrix=TRUE)

data

$GSE2034_series_matrix.txt.gz
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22283 features, 286 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM36777 GSM36778 ... GSM37062 (286 total)
  varLabels: title geo_accession ... bone relapses (1=yes, 0=no):ch1
    (28 total)
  varMetadata: labelDescription
featureData
  featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (22283 total)
  fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL96

You can proceed to downstream analyses with this data, accessed via exprs[data[[1]]]

------------------------------------------------

Note that, on the home page for GSE2034 (HERE), there's a big blue button at the bottom labelled ANALYZE WITH GEO2R

j

Click on that and then go to the R script tab. There, you'll find a ready-made way to read in what is [usually] the normalized data.

Kevin

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Kevin Blighe68k

Thank you for your response! I am sorry I made a mistake here. library(GEOquery) will download 'series matrix' data. I met the problem when I try to directly read in downloaded series matrix data:

data <- getGEO(filename = 'GSE2034_series_matrix.txt.gz' )
data <- as.data.frame(exprs(data[[1]]))

And got the error. I am just looking for a way to use local files instead of downloading everytime. data <- getGEO('GSE2034') will download the data again right?

ADD REPLYlink written 2.2 years ago by wenbinm20

What is the error? Yes, you can just download the series matrix file and then load it with:

gse <- getGEO(filename="GSE2034_series_matrix.txt.gz")

Then, access the normalised expression values with:

exprs(gse)

...or:

exprs(gse[[1]])

--------------------------------

If you run getGEO('GSE2034', GSEMatrix=TRUE) twice in the same session, then it will use the data that was already downloaded:

data <- getGEO('GSE2034', GSEMatrix=TRUE)
Found 1 file(s)
GSE2034_series_matrix.txt.gz
tentando a URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2034/matrix/GSE2034_series_matrix.txt.gz'
Content type 'application/x-gzip' length 14344700 bytes (13.7 MB)
==================================================
downloaded 13.7 MB


data <- getGEO('GSE2034', GSEMatrix=TRUE)
Found 1 file(s)
GSE2034_series_matrix.txt.gz
Using locally cached version: /tmp/RtmppE74xT/GSE2034_series_matrix.txt.gz
ADD REPLYlink written 2.2 years ago by Kevin Blighe68k
Im getting an error 

library(GEOquery)
library(Biobase)

gse <- getGEO("GSE53987",GSEMatrix=TRUE) # you want GSEMatrix = TRUE


gse <- gse$GSE53987_series_matrix.txt.gz
gse


#data <- getGEO('GSE2034', GSEMatrix=TRUE)




# now get the phenotypic data (covariates etc.) using pData()
pd <- pData(gse)
names(pd)
#library(dplyr)

x <- exprs(gset[[1]])


x <- x[-grep('^AFFX', rownames(x)),]

# extract information of interest from the phenotype data (pdata)
idx <- which(colnames(pData(gse[[1]])) %in%
               c('age:ch1'))

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘pData’ for signature ‘"factor"’

Now i downloaded the data fresh and i see

library(GEOquery)
library(Biobase)

gse <- getGEO("GSE53987",GSEMatrix=TRUE) # you want GSEMatrix = TRUE


gse <- gse$GSE53987_series_matrix.txt.gz
gse


#data <- getGEO('GSE2034', GSEMatrix=TRUE)




# now get the phenotypic data (covariates etc.) using pData()
pd <- pData(gse)
names(pd)
#library(dplyr)

x <- exprs(gse[[1]])

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘exprs’ for signature ‘"factor"’

ADD REPLYlink modified 16 months ago • written 16 months ago by krushnach80850

i manage to fix it .Now its working

ADD REPLYlink written 16 months ago by krushnach80850
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1964 users visited in the last hour