Question: Gpl Data Doesn'T Match Data In Gsm Record Using Geoquery Getgeo
2
gravatar for Kenneth Daily
6.9 years ago by
Bethesda, MD
Kenneth Daily50 wrote:

I'm trying to get the data for the following publication:

http://www.nature.com/nature/journal/v447/n7147/full/nature05886.html

The specific GEO accession I'm trying to retrieve is GSE7606 (its a sub-series of GSE7615, the full dataset for the paper mentioned). I've used GEOquery to download the supplementary files for the series and extracted them:

require(GEOquery)
gseid <- "GSE7606"
supp.melanoma <- getGEOSuppFiles(gseid)
## manually un-tar/gunzip them

Since they are CGH profiles, I'm reading them as such:

require(limma)
datapath <- "/path/to/data/GSE7606/"
filenames <- list.files(datapath, pattern="GSM.*.txt")

cgh.data <- read.maimages(files=filenames,
                          path=datapath,
                          columns=list(G="gMedianSignal", Gb="gBGMedianSignal",
                                       R="rMedianSignal", Rb="rBGMedianSignal"),
                          annotation=c("Row", "Col","FeatureNum", "ControlType","ProbeName",
                                       "ProbeUID", "SystematicName", "GeneName"),
                          source='agilent')

I want to segment them for CGH analysis. For whatever reason, the files don't have the chromosomal locations included. OK, so I'll get them from the GPL (which according to the GSE7606 is GPL887). Also of note, a txt file of a supposed old version of the GPL data for these files is included in the supplementary data, which we will see does not work:

# try to get directly from GEO; this works!
gpl887 <- getGEO("GPL887", destdir="./data/GSE7606/")

# try to read from their file; doesn't work!
gpl887.included <- getGEO(filename=paste(datapath, "GPL887_old_annotations.txt", sep="/"))

But their file does not load correctly:

> gpl887.included
An object of class "GPL"
An object of class "GEODataTable"
****** Column Descriptions ******
data frame with 0 columns and 0 rows
****** Data Table ******
data frame with 0 columns and 0 rows

Furthermore, I can't match up IDs from nearly half of the probes in the CGH data with annotations from the GPL data:

> ingpl <- cgh.data$gene$ProbeName %in% Table(gpl887)$SPOT_ID
> summary(ingpl)
   Mode   FALSE    TRUE    NA's 
logical   10295   11858       0

I've also tried another GSE that has the same platform, with the same results.

Also, trying to load the GSE directly does not work either, and may point to the same problem:

> data.melanoma <- getGEO("GSE7606", destdir=datadir)
Found 1 file(s)
GSE7606_series_matrix.txt.gz
   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                  Dload  Upload   Total   Spent    Left  Speed
100 9709k  100 9709k    0     0  16.9M      0 --:--:-- --:--:-- --:--:-- 17.9M
File stored at: 
/tmp/Rtmp8NZ4Fj/GPL887.soft
Error in validObject(.Object) : 
  invalid class “ExpressionSet” object: featureNames differ between assayData and featureData

What am I missing; how can I get the proper chromosomal coordinates for the probes on this chip?

Thanks!

> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=C                LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sva_3.0.3             mgcv_1.7-13           corpcor_1.6.2        
 [4] DAVIDQuery_1.14.0     RCurl_1.91-1          bitops_1.0-4.1       
 [7] GOstats_2.20.0        Category_2.20.0       GEOquery_2.21.9      
[10] topGO_2.6.0           SparseM_0.96          GO.db_2.6.1          
[13] graph_1.32.0          hgu133a2.db_2.6.3     org.Hs.eg.db_2.6.4   
[16] RSQLite_0.11.1        DBI_0.2-5             limma_3.10.3         
[19] annotate_1.32.3       AnnotationDbi_1.16.19 gcrma_2.26.0         
[22] affy_1.32.1           Biobase_2.14.0        ggplot2_0.9.0        
[25] reshape_0.8.4         plyr_1.7.1            ProjectTemplate_0.3-5
[28] testthat_0.6         

loaded via a namespace (and not attached):
 [1] affyio_1.22.0         BiocInstaller_1.2.1   Biostrings_2.22.0    
 [4] colorspace_1.1-1      dichromat_1.2-4       digest_0.5.2         
 [7] evaluate_0.4.1        genefilter_1.36.0     grid_2.14.1          
[10] GSEABase_1.16.1       IRanges_1.12.6        lattice_0.20-6       
[13] MASS_7.3-17           Matrix_1.0-4          memoise_0.1          
[16] munsell_0.3           nlme_3.1-103          preprocessCore_1.16.0
[19] proto_0.3-9.2         RBGL_1.30.1           RColorBrewer_1.0-5   
[22] reshape2_1.2.1        scales_0.2.0          splines_2.14.1       
[25] stringr_0.6           survival_2.36-12      tools_2.14.1         
[28] XML_3.9-4             xtable_1.7-0          zlibbioc_1.0.1
geo R bioconductor • 2.3k views
ADD COMMENTlink modified 6.9 years ago by Vikas Bansal2.3k • written 6.9 years ago by Kenneth Daily50
0
gravatar for Vikas Bansal
6.9 years ago by
Vikas Bansal2.3k
Berlin, Germany
Vikas Bansal2.3k wrote:

I am not sure if this will solve your problem but in

gpl887.included <- getGEO(filename=paste(datapath, "GPL887_old_annotations.txt", sep="/"))

try -> sep=""
because in

datapath <- "/path/to/data/GSE7606/"

you have already used forward slash in the end. May be this can be one of the reason for your mentioned problem with -

> gpl887.included
An object of class "GPL"
An object of class "GEODataTable"
****** Column Descriptions ******
data frame with 0 columns and 0 rows
****** Data Table ******
data frame with 0 columns and 0 rows
ADD COMMENTlink written 6.9 years ago by Vikas Bansal2.3k

I doubt that this is the problem; on Unix-like systems, an extra forward slash in the path does not matter. In addition I can replicate the issue using the downloaded file.

ADD REPLYlink modified 6.9 years ago • written 6.9 years ago by Neilfws48k

Ah! You are right. My bad.

ADD REPLYlink written 6.9 years ago by Vikas Bansal2.3k

Thanks! But yeah, not the culprit. I examined the format of the file, as it's supposed to be in SOFT format and compared to the GEO format specification (http://www.ncbi.nlm.nih.gov/projects/geo/info/soft2.html) and didn't see any glaring problems.

ADD REPLYlink written 6.9 years ago by Kenneth Daily50

I've edited the post to include a few more things I tried - another GSE, and loading the GSE directly with getGEO (neither work).

ADD REPLYlink written 6.9 years ago by Kenneth Daily50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2237 users visited in the last hour