Parsing error when loading microarray data from GEO
2
0
Entering edit mode
4.7 years ago
Kim ▴ 20

Hello everyone

I try to load microarray data from GEO to R (I download the files to my computer then load it to R), but whether I use Matrix file or SOFT file, there's always this parsing error (following code and result). Despite the parsing errors, when I draw the boxplot with this data, the result still looks fine with all samples normalised. Therefore, I'm wondering if this error doesn't affect the expression data. But if it does affect, do you know I can fix this?

Thank you very much

> #Load data
> GSE61659_matrix <- getGEO(filename = "GSE61659_series_matrix.txt.gz", GSEMatrix = T)
Parsed with column specification:
cols(
  .default = col_double(),
  ID_REF = col_character()
)
See spec(...) for full column specifications.
File stored at: 
C:\Users\k286o\AppData\Local\Temp\Rtmp4wSQw3/GPL1261.soft
Warning: 64 parsing failures.  
  row     col           expected    actual         file    
45038 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data    
45039 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data    
45040 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data   
45041 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data  
45042 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data
 ..... ....... .................. ......... ............   
See problems(...) for more details.  
> View(GSE61659_matrix)   
> GSE61659 <- getGEO(filename = "GSE61659_family.soft.gz", GSEMatrix = T) 
Reading file....  
Parsing....  
Found 58 entities...   
GPL1261 (1 of 59 entities) 
Warning: 64 parsing failures.   
  row     col           expected    actual         file   
45038 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data  
45039 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data   
45040 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data   
45041 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data   
45042 SPOT_ID 1/0/T/F/TRUE/FALSE --Control literal data   
..... ....... .................. ......... ............   
See problems(...) for more details.   
GSM1510154 (2 of 59 entities)   
GSM1510155 (3 of 59 entities)    
GSM1510156 (4 of 59 entities)  
GSM1510157 (5 of 59 entities)   
GSM1510158 (6 of 59 entities)    
GSM1510159 (7 of 59 entities)   
GSM1510160 (8 of 59 entities)   
GSM1510161 (9 of 59 entities)  
GSM1510162 (10 of 59 entities)   
GSM1510163 (11 of 59 entities)
microarray parsing error GEO SOFT file matrix file • 3.1k views
ADD COMMENT
0
Entering edit mode

Hi, please use the code option to highlight code and data examples. It is simply selecting the respective parts of the post with the mouse and then clicking the buttom:

enter image description here

ADD REPLY
0
Entering edit mode

Edited. Thank you. I'm quite new to this community :)

ADD REPLY
0
Entering edit mode

No problem ;-)

ADD REPLY
0
Entering edit mode
4.7 years ago

I'm not sure about the parsing errors, however, it relates to the featureData for the microarray in question. You can do a 'clean' download like this:

library(Biobase)
library(GEOquery)
gset <- getGEO("GSE61659", GSEMatrix =TRUE, getGPL=FALSE)[[1]]
gset

ExpressionSet (storageMode: lockedEnvironment)
assayData: 45101 features, 57 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM1510154 GSM1510155 ... GSM1510210 (57 total)
  varLabels: title geo_accession ... tissue class:ch1 (40 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
  pubMedIds: 25965574 
Annotation: GPL1261

You can then download an annotation table for this microarray (Mouse430_2) via biomaRt, like this:

require(biomaRt)
mart <- useMart('ENSEMBL_MART_ENSEMBL')
mart <- useDataset('mmusculus_gene_ensembl', mart)
annotLookup <- getBM(
  mart = mart,
  attributes = c(
    'affy_mouse430_2',
    'wikigene_description',
    'ensembl_gene_id',
    'gene_biotype',
    'mgi_symbol'))

This can be used to annotate the probes.

Kevin

ADD COMMENT
0
Entering edit mode

Hello Kevin Thank you very much for your reply. I could never get data directly from GEO. It's always like this:

> gset <- getGEO("GSE61659", GSEMatrix =TRUE, getGPL=FALSE)[[1]]
Error in open.connection(x, "rb") : 
  Timeout was reached: Connection timed out after 10000 milliseconds

So I tried to load the downloaded SOFT file and the result is this:

 #Load data
> GSE126327 <- getGEO(filename = "GSE126327_family.soft.gz", GSEMatrix = TRUE, getGPL = FALSE)[[1]]
Reading file....
|======================================================================================================================| 100%  249 MB
Parsing....
Found 16 entities...
GPL16570 (1 of 17 entities)
|======================================================================================================================| 100%  238 MB
Warning: 190 parsing failures.
  row col expected         actual         file
41612  ID a double AFFX-BioB-3_at literal data
41613  ID a double AFFX-BioB-3_st literal data
41614  ID a double AFFX-BioB-5_at literal data
41615  ID a double AFFX-BioB-5_st literal data
41616  ID a double AFFX-BioB-M_at literal data
..... ... ........ .............. ............
See problems(...) for more details.
Error in getGEO(filename = "GSE126327_family.soft.gz", GSEMatrix = TRUE,  : 
  this S4 class is not subsettable

> GSE126327
 [1] Normal choroid plexus, biological rep1    Normal choroid plexus, biological rep2    Normal choroid plexus, biological rep3   
 [4] Normal choroid plexus, biological rep4    Normal choroid plexus, biological rep5    Choroid plexus carcinoma, biological rep1
 [7] Choroid plexus carcinoma, biological rep2 Choroid plexus carcinoma, biological rep3 Choroid plexus carcinoma, biological rep4
[10] Choroid plexus carcinoma, biological rep5 Choroid plexus papilloma, biological rep1 Choroid plexus papilloma, biological rep2
[13] Choroid plexus papilloma, biological rep3 Choroid plexus papilloma, biological rep4 Choroid plexus papilloma, biological rep5
15 Levels: Choroid plexus carcinoma, biological rep1 ... Normal choroid plexus, biological rep5

What's your opinion on this? Thank you

ADD REPLY
0
Entering edit mode

Are you running R on a local machine or a cluster? Can you restart your R session to clear the cache, and then try again? The code runs successfully here (I am currently in The Americas, but not USA).

ADD REPLY
0
Entering edit mode

I'm not running R on a cluster, but on a computer of my institution in Germany. The institution network blocks access to ftp sites so I cannot get files directly from GEO. I restarted R and tried your codes but the result is still unchanged. I don't know if the ftp blockage affects the SOFT file when I downloaded it or not, but if the codes work for you, seem like the network has a role here :(

ADD REPLY
0
Entering edit mode

I see. What about:

getGEO(filename = "GSE61659_series_matrix.txt.gz", GSEMatrix = TRUE, getGPL=FALSE)

?

ADD REPLY
0
Entering edit mode

Thanks Kevin. This command works. But now I'm stuck at this:

mart <- useMart('ENSEMBL_MART_ENSEMBL')

Time-out error. I guess it has something to do with the ftp connection again. But I can download the GPL file from GEO. I'm thinking of using the GPL file for annotation. Do you know how I can do that?

ADD REPLY
0
Entering edit mode

Ah, could you try:

mart <- useMart(host="useast.ensembl.org",
  biomart="ENSEMBL_MART_ENSEMBL")

Sometimes there are problems with the default mirror being used by useMart(). This happens to us all.

ADD REPLY
0
Entering edit mode

It's still doesn't work

mart <- useMart(host="useast.ensembl.org",
+                 biomart="ENSEMBL_MART_ENSEMBL")
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: Connection timed out after 10000 milliseconds

I downloaded the GPL files and realized that probes that are missed the annotation in Matrix files are also missed the annotation in GPL files. Do you think this is an annotation error of Affymetrix rather than the program or network errors? I think I would discard those probes if they are errors from the developer.

ADD REPLY

Login before adding your answer.

Traffic: 1763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6