Question: Puzzling Error Message While Working Through A Bioconductor Tutorial On Microarrays
2
gravatar for Mycroft34
7.8 years ago by
Mycroft34110
IRCM, Montpellier, France
Mycroft34110 wrote:

I tried recently somme tutorial on microarray data analysis, using either the following link: http://bioinformatics.knowledgeblog.org/2011/06/20/analysing-microarray-data-in-bioconductor/ or the chapter on bioconductor from "R in a nutshell". After installing and loading the GEOquery package, I tried loading data as indicated:

library(GEOquery)
getGEOSuppFiles("GSE20986")

and I was returned the following error message, in both cases:

    [1] "ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/series/GSE20986/"
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 1 did not have 6 elements

This is only an example, since the file involved in "R in a nutshell" is GSE2034, producing the same error. As I understand the error message, it tells me that the line 1 has a size different from the 6 elements expected for the data.frame; this is supprising for data retrieved from the NCBI server; so I think something else is faulty. Did anyone has had such an error and found what was wrong and how bypass this block. Thanks in advance.

I use R 2.15.1 on ubuntu 12.04, with bioconductor 2.10; here is the result of sessionInfo():

    R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GEOquery_2.23.5     Biobase_2.16.0      BiocGenerics_0.2.0 
[4] BiocInstaller_1.4.7

loaded via a namespace (and not attached):
[1] RCurl_1.91-1 tools_2.15.1 XML_3.9-4
R bioconductor • 4.3k views
ADD COMMENTlink modified 5.7 years ago by zhanxw20 • written 7.8 years ago by Mycroft34110
2

Did you check that your libcurl supports ftp? http://www.omegahat.org/RCurl/FAQ.html

ADD REPLYlink written 7.8 years ago by brentp23k

Thanks for your help; I followed what was indicated on this page:

curl -V
curl 7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap pop3 pop3s rtmp rtsp smtp smtps telnet tftp 
Features: GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP

It seems that curl support ftp; is that different from libcurl ?

ADD REPLYlink modified 7.8 years ago • written 7.8 years ago by Mycroft34110
1

Did you get a message that looked like:

[1] "ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/series/GSE20986/"
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/series/GSE20986//GSE20986_RAW.tar'
ftp data connection made, file length 56360960 bytes
opened URL
==================================================
downloaded 53.8 Mb
ADD REPLYlink modified 7.8 years ago • written 7.8 years ago by Sean Davis26k

Thanks for your help; No I didn't; I see what you mean; the double // would be the source of the problem; but I just got the message from above;

however, I remember having such a message in another circumstance; how did you solved this problem ?

For the moment, and specifically for the web tutorial (http://bioinformatics.knowledgeblog.org/2011/06/20/analysing-microarray-data-in-bioconductor/), I bypassed the block by downloading the files.

But the problem remains for the "R in a nutshell example".

ADD REPLYlink modified 7.8 years ago by Istvan Albert ♦♦ 84k • written 7.8 years ago by Mycroft34110
0
gravatar for brentp
7.8 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

This works for me with sessionInfo pasted below. Perhaps try setting LC_ALL=C to test as only our locales differ.

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GEOquery_2.23.5     Biobase_2.16.0      BiocGenerics_0.2.0 
[4] BiocInstaller_1.4.7

loaded via a namespace (and not attached):
[1] RCurl_1.91-1 XML_3.9-4    tools_2.15.1

EDIT:

Is your libcurl built with FTP support? http://www.omegahat.org/RCurl/FAQ.html

ADD COMMENTlink modified 7.8 years ago • written 7.8 years ago by brentp23k

Thanks for your reply; I tried this LC_ALL=C settings, but the error remained.

ADD REPLYlink written 7.8 years ago by Mycroft34110

Works for me too, with locale = en_AU.UTF-8. The error message is misleading; it just means that getGEOSuppFiles() could not access the remote file(s) for some reason. Possibly a transient network error.

ADD REPLYlink modified 7.8 years ago • written 7.8 years ago by Neilfws48k

Thanks for this info; I also thought that network could be the problem; all internet traffic in our institute is passing through a proxy; could it be the cause of that error ? and how could I manage to bypass this block ?

ADD REPLYlink written 7.8 years ago by Mycroft34110

See the help for download.file.

ADD REPLYlink written 7.8 years ago by Sean Davis26k
1

I read it and set the proxy using

export HTTP_PROXY (and  the same for FTP_PROXY)

before running R; I also checked that the proxy was set in R using Sys.getenv(), and it appeared set; but I am still having the same error:

> getGEOSuppFiles('GSE20986')
[1] "ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/series/GSE20986/"
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
line 1 did not have 6 elements

I also updated the packages, since I remembered RCurl having been updated some days ago. A new version have been installed (with several warnings, but nothing else); I am running now RCurl version 1.95-0.1.

I have also received the message :

Setting options('download.file.method.GEOquery'='curl')

after loading GEOquery. Does it means that linux curl is used instead of RCurl ?

ADD REPLYlink modified 7.8 years ago • written 7.8 years ago by Mycroft34110

I don't know why but I'm getting the same error message when trying to use BiomaRt. I'm attaching here my error. Looks very similar with yours. Commands:

library("biomaRt")

ensembl = useMart("ensembl",dataset="hsapiensgeneensembl")

affyids=c("202763at","209310sat","207500at")

getBM(attributes=c('affyhgu133plus2', 'entrezgene'), filters = 'affyhgu133plus2', values = affyids, mart = ensembl)

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 4 did not have 2 elements

R version 2.15.1 (2012-06-22)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: [1] LCCTYPE=enUS.UTF-8 LCNUMERIC=C LCTIME=enUS.UTF-8
[4] LC
COLLATE=enUS.UTF-8 LCMONETARY=enUS.UTF-8 LCMESSAGES=enUS.UTF-8
[7] LC
PAPER=C LCNAME=C LCADDRESS=C
[10] LCTELEPHONE=C LCMEASUREMENT=enUS.UTF-8 LCIDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] biomaRt2.12.0 BiocInstaller1.4.7

loaded via a namespace (and not attached): [1] RCurl1.95-0.1 tools2.15.1 XML_3.95-0.1

ADD REPLYlink modified 7.8 years ago • written 7.8 years ago by Raony Guimarães1.1k

Since changing the RCurl version did not solved my problem (with GEOquery), I tried your case; I suggest that you edit your reply, since the name of the attribute from getBM is incorrect: it should be

"affy_hg_u133_plus_2"

instead of "affyhgu133plus2"; that makes reproducing your case a little bit difficult, since I had to retrieve the correct name.

The same apply to "hsapiensgeneensembl", that should be

"hsapiens_gene_ensembl".

Correction: to have the name correctly inserted in your message, you should put your R commands as code (inserting 4 space in front of it); otherwise, the underscores are removed (at least in the comments).

ADD REPLYlink modified 7.8 years ago • written 7.8 years ago by Mycroft34110
0
gravatar for Raony Guimarães
7.8 years ago by
Dublin / Ireland
Raony Guimarães1.1k wrote:

Finally, I solved the problem!

You need to download the previous version of RCurl http://cran.r-project.org/src/contrib/Archive/RCurl/RCurl_1.91-1.tar.gz

and install using the command:

install.packages("~/Downloads/RCurl_1.91-1.tar.gz", repos=NULL)

Thank you brentp for the insights!

ADD COMMENTlink modified 7.8 years ago • written 7.8 years ago by Raony Guimarães1.1k

Unfortunately, changing RCurl version did not correct my own problem with the GEOquery package. May be another package ought to be backed to a previous version; does anyone has an idea what package it would be ? (see sessionInfo in original message).

ADD REPLYlink written 7.8 years ago by Mycroft34110
0
gravatar for zhanxw
5.7 years ago by
zhanxw20
United States
zhanxw20 wrote:

I manually download files, and then manually import the data: 

    gset <- getGEO(filename="GSE10246_series_matrix.txt.gz", GSEMatrix = TRUE)

 

ADD COMMENTlink written 5.7 years ago by zhanxw20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1497 users visited in the last hour