Question: getGEO not returning correct sample names
0
gravatar for bjwiley23
5 months ago by
bjwiley2320
bjwiley2320 wrote:

Hi,

Normally when I use getGEO for microarray experiment, the sampleNames() defaults to the "GSM**" name but not for GEO="GSE6691". It's defaulting the sampleNames to the data for probe 201088_at. Not sure why this is so. It works perfectly fine for other GEO IDs such as "GSE19147". Why is this? Here is code and output. There is also supposed to be 22283 features but I only get 21667. I get same parsing errors for "GSE19147" but I get correct sampleNames and correct 22283 features.

Input:

my.gse2 <- "GSE6691"
my.geo.gse2 <- getGEO(GEO=my.gse2,
                      destdir="./geo_downloads",
                      getGPL=T)

Output:

Parsed with column specification:
cols(
  .default = col_double(),
  `201088_at` = col_character()
)
See spec(...) for full column specifications.
Using locally cached version of GPL96 found here:
./geo_downloads/GPL96.soft 
Warning: 68 parsing failures.
  row     col           expected    actual         file
22216 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data
22217 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data
22218 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data
22219 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data
22220 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data
..... ....... .................. ......... ............
See problems(...) for more details.

Input:

my.geo.gse2
sampleNames(my.geo.gse2)

Output:

$GSE6691_series_matrix.txt.gz
ExpressionSet (storageMode: lockedEnvironment)
assayData: 21667 features, 56 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: 5.82592 6.799907 ... 5.8102 (56 total)
  varLabels: title geo_accession ... data_row_count (52 total)
  varMetadata: labelDescription
featureData
  featureNames: 201089_at 201090_x_at ... AFFX-TrpnX-M_at (21667 total)
  fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
  pubMedIds: 17252022 
Annotation: GPL96


[1] "5.82592"  "6.799907" "5.388776" "6.630125" "7.093243" "6.426071" "6.482608" "6.363402"
 [9] "5.429877" "6.561914" "7.302787" "7.182047" "6.25908"  "8.237549" "6.504661" "6.386279"
[17] "6.709859" "5.855908" "6.686627" "6.401285" "6.066935" "7.919946" "6.703274" "7.679963"
[25] "8.246524" "7.534912" "7.483763" "7.917386" "5.945879" "5.479577" "5.179641" "5.507846"
[33] "5.033364" "7.497475" "8.670058" "6.941275" "7.268032" "8.177226" "7.658442" "6.746167"
[41] "5.127423" "7.357877" "7.106516" "6.029429" "6.865016" "4.837314" "6.368258" "6.591957"
[49] "5.837441" "6.472433" "6.593988" "7.668101" "5.39544"  "4.960329" "6.512131" "5.8102"
getgeo R • 219 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by bjwiley2320

Well at least I know where the problem is but as to why I am still concerned. That probe 201088_at is at line 616 which is the exact amount of probes missing from the getGEO() ExpressionSet. It is starting at line 616 of the data using the data from that line as the "header" or sample names. Where would I report something like this?

ADD REPLYlink written 5 months ago by bjwiley2320
1

GEOquery documentation here indicates where to report issues: https://bioconductor.org/packages/release/bioc/vignettes/GEOquery/inst/doc/GEOquery.html#reporting-problems-or-bugs

ADD REPLYlink modified 5 months ago • written 5 months ago by Ahill1.9k

FWIW it seems this might be an unexpected skip behavior in readr library's read_tsv for this particular GSE. Which might be worth reporting at the places recommended in the link above. Presuming you have downloaded and unzipped the series matrix file, using the traditional read.delim in place of read_tsv:

fname <- "GSE6691_series_matrix.txt"
dat <- read_lines(fname)
series_table_begin_line = grep("^!series_matrix_table_begin", dat)
datamat <- read.delim(fname, skip=series_table_begin_line, quote='"', comment.char="!")
dim(datamat)
datamat[1:3,1:3]

shows the expected result, a 22283x57 matrix.

ADD REPLYlink modified 5 months ago • written 5 months ago by Ahill1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1808 users visited in the last hour
_