Question: Cannot read Illumina data (.bgx and .txt format) in R for microanalysis
2
gravatar for deeptivipin
5.3 years ago by
deeptivipin20
United States
deeptivipin20 wrote:

Hi All!

This is my first time with Illumina datasets and I'm stuck near loading data. My files are in . bgx and .txt format and they are not being read in limma, lumi and beadarray in R-v 3.1.1.  It shows errors as follows both file types:

lumi:

 Error in gregexpr("\t", dataLine1)[[1]] : subscript out of bounds

limma:

  Error in readGenericHeader(fname, columns = expr, sep = sep) : 
  Specified column headings not found in file

Would really appreciate your help.

Thanks in advance!

PS: link to the txt file

https://drive.google.com/file/d/0B7pQgf5qPm2gaThKTnNaMDcxMXM/edit?usp=sharing

 

ADD COMMENTlink modified 5.3 years ago by Sean Davis25k • written 5.3 years ago by deeptivipin20

You'll probably have to share at least a few lines of the txt file for us to make any suggestions.

ADD REPLYlink written 5.3 years ago by Sean Davis25k

Thanks Sean!

I've added a  link to download the file.

 

ADD REPLYlink written 5.3 years ago by deeptivipin20

Any solution found?

ADD REPLYlink written 5.0 years ago by Zhilong Jia1.5k

Yes. I found that my firewall was blocking the download after all.

ADD REPLYlink written 5.0 years ago by deeptivipin20
2
gravatar for Sean Davis
5.3 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

Your txt is a file from NCBI GEO.  The easiest way to get data from GEO into R is to use the GEOquery package.

library(GEOquery)
eset = getGEO('GSE28985')[[1]]

Now, eset is an ExpressionSet and you can use it with limma, etc.  See the Biobase vignette describing ExpressionSets if you want more detail.

ADD COMMENTlink written 5.3 years ago by Sean Davis25k

I tried GEOquery as you mentioned. Still showing the same error.

ADD REPLYlink written 5.3 years ago by deeptivipin20

So, if you run the code in my answer, you get an error?  Or were you using some other code?  If it was something else, what did you type and what was the error?

ADD REPLYlink written 5.3 years ago by Sean Davis25k

I ran the codes that you gave after installing package limma. It showed the following:

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE28nnn/GSE28985/matrix/
Found 1 file(s)
GSE28985_series_matrix.txt.gz
Using locally cached version: C:\Users\DEEPTI\AppData\Local\Temp\Rtmpa4JYMi/GSE28985_series_matrix.txt.gz
Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) : 
  invalid 'nlines' argument

 

 

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by deeptivipin20

And what is the output of sessionInfo() after loading the GEOquery library?  

ADD REPLYlink written 5.3 years ago by Sean Davis25k

Actually this was the initial result to

eset = getGEO('GSE28985')[[1]]

ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE28nnn/GSE28985/matrix/
Found 1 file(s)
GSE28985_series_matrix.txt.gz
trying URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE28nnn/GSE28985/matrix/GSE28985_series_matrix.txt.gz'
using Synchronous WinInet calls
Error in download.file(sprintf("ftp://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  : 
  cannot open URL 'ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE28nnn/GSE28985/matrix/GSE28985_series_matrix.txt.gz'
In addition: Warning message:
In download.file(sprintf("ftp://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s",  :
  InternetOpenUrl failed: 'The FTP session was terminated

When I tried it again I got what I posted above

The output for sessionInfo() is

 

R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252    LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] limma_3.20.8        GEOquery_2.30.1     affy_1.42.3         Biobase_2.24.0      BiocGenerics_0.10.0

loaded via a namespace (and not attached):
[1] affyio_1.32.0         BiocInstaller_1.14.2  preprocessCore_1.26.1 RCurl_1.95-4.3       
[5] tools_3.1.0           XML_3.98-1.1          zlibbioc_1.10.0     

 

R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252    LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] limma_3.20.8        GEOquery_2.30.1     affy_1.42.3         Biobase_2.24.0      BiocGenerics_0.10.0

loaded via a namespace (and not attached):
[1] affyio_1.32.0         BiocInstaller_1.14.2  preprocessCore_1.26.1 RCurl_1.95-4.3       
[5] tools_3.1.0           XML_3.98-1.1          zlibbioc_1.10.0     

ADD REPLYlink written 5.2 years ago by deeptivipin20

this gets you already normalized data. how do you get raw data from txt file?

ADD REPLYlink written 26 days ago by salamandra240

Downloading the supplemental files from the GEO website (or using GEOquery) will get you the files. From there, you'll have to determine the best way to parse them.

ADD REPLYlink written 26 days ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1457 users visited in the last hour