Cannot read Illumina data (.bgx and .txt format) in R for microanalysis
1
2
Entering edit mode
7.0 years ago
deeptivipin ▴ 20

Hi All!

This is my first time with Illumina datasets and I'm stuck near loading data. My files are in . bgx and .txt format and they are not being read in limma, lumi and beadarray in R-v 3.1.1.  It shows errors as follows both file types:

lumi:

 Error in gregexpr("\t", dataLine1)[[1]] : subscript out of bounds

limma:

  Error in readGenericHeader(fname, columns = expr, sep = sep) : 
  Specified column headings not found in file

Would really appreciate your help.

Thanks in advance!

PS: link to the txt file

https://drive.google.com/file/d/0B7pQgf5qPm2gaThKTnNaMDcxMXM/edit?usp=sharing

 

microarray read illumina data • 6.0k views
ADD COMMENT
0
Entering edit mode

You'll probably have to share at least a few lines of the txt file for us to make any suggestions.

ADD REPLY
0
Entering edit mode

Thanks Sean!

I've added a  link to download the file.

 

ADD REPLY
0
Entering edit mode

Any solution found?

ADD REPLY
0
Entering edit mode

Yes. I found that my firewall was blocking the download after all.

ADD REPLY
2
Entering edit mode
7.0 years ago

Your txt is a file from NCBI GEO.  The easiest way to get data from GEO into R is to use the GEOquery package.

library(GEOquery)
eset = getGEO('GSE28985')[[1]]

Now, eset is an ExpressionSet and you can use it with limma, etc.  See the Biobase vignette describing ExpressionSets if you want more detail.

ADD COMMENT
0
Entering edit mode

I tried GEOquery as you mentioned. Still showing the same error.

ADD REPLY
0
Entering edit mode

So, if you run the code in my answer, you get an error?  Or were you using some other code?  If it was something else, what did you type and what was the error?

ADD REPLY
0
Entering edit mode

I ran the codes that you gave after installing package limma. It showed the following:

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE28nnn/GSE28985/matrix/
Found 1 file(s)
GSE28985_series_matrix.txt.gz
Using locally cached version: C:\Users\DEEPTI\AppData\Local\Temp\Rtmpa4JYMi/GSE28985_series_matrix.txt.gz
Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) : 
  invalid 'nlines' argument

 

 

ADD REPLY
0
Entering edit mode

And what is the output of sessionInfo() after loading the GEOquery library?  

ADD REPLY
0
Entering edit mode

Actually this was the initial result to

eset = getGEO('GSE28985')[[1]]

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE28nnn/GSE28985/matrix/
Found 1 file(s)
GSE28985_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE28nnn/GSE28985/matrix/GSE28985_series_matrix.txt.gz'
using Synchronous WinInet calls
Error in download.file(sprintf("ftp://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  : 
  cannot open URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE28nnn/GSE28985/matrix/GSE28985_series_matrix.txt.gz'
In addition: Warning message:
In download.file(sprintf("ftp://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :
  InternetOpenUrl failed: 'The FTP session was terminated

When I tried it again I got what I posted above

The output for sessionInfo() is

 

R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252    LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] limma_3.20.8        GEOquery_2.30.1     affy_1.42.3         Biobase_2.24.0      BiocGenerics_0.10.0

loaded via a namespace (and not attached):
[1] affyio_1.32.0         BiocInstaller_1.14.2  preprocessCore_1.26.1 RCurl_1.95-4.3       
[5] tools_3.1.0           XML_3.98-1.1          zlibbioc_1.10.0     

 

R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252    LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] limma_3.20.8        GEOquery_2.30.1     affy_1.42.3         Biobase_2.24.0      BiocGenerics_0.10.0

loaded via a namespace (and not attached):
[1] affyio_1.32.0         BiocInstaller_1.14.2  preprocessCore_1.26.1 RCurl_1.95-4.3       
[5] tools_3.1.0           XML_3.98-1.1          zlibbioc_1.10.0     

ADD REPLY
0
Entering edit mode

this gets you already normalized data. how do you get raw data from txt file?

ADD REPLY
0
Entering edit mode

Downloading the supplemental files from the GEO website (or using GEOquery) will get you the files. From there, you'll have to determine the best way to parse them.

ADD REPLY

Login before adding your answer.

Traffic: 1543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6