Question

illumina Arrays Illumina HumanHT-12 V3.0 expression beadchip reading data

3

Entering edit mode

4.5 years ago

ASid ▴ 40

Hi All

So I have this GEO illumina data that i need to analyze with GEO ID GSE19435. This is a Illumina HumanHT-12 V3.0 expression beadchip . I want to analyze the non normalized data which is usually provided as non_normalized separate series file with this name for example you can find one in GSE42825.

I have analyzed previously Illumina HumanHT-12 V4.0 expression beadchip and when I want to download their raw data it comes as a single txt file /.txt.gz non normalized attached in supplementary data in GEO with specifically in the name mentioned that it is raw or non normalized data.

But for this GSE19435 (which is Illumina HumanHT-12 V3.0 ) when I download the raw data , it gives me a single file for every sample individually. I could verify that the data is definitely raw and non normalized but I have no clue which function allows me to let it read in a combined way all of these files as a single expression data file.

Thanks for your help in advance. I will appreciate if you could give me the exact code.

microarray Illumina • 4.4k views

ADD COMMENT • link updated 4.5 years ago by Kevin Blighe 87k • written 4.5 years ago by ASid ▴ 40

score 3 · Answer 1 · 2019-10-16

This is not an automated workflow. - please take the code and adapt it to your own situation and review each and every command.

-----------

You can read them into your R environment manually, via read.table(), read.csv(), or something else. Unfortunately, for Illumina bead arrays, authors do not seem to typically upload the raw data idat files, meaning that neither the illuminaio package nor limma::read.idat() in R can be used at the initial data-reading step.

For example, take a look, assuming that the .txt.gz files are in GSE19435_RAW/:

require(data.table)
x <- lapply(
  list.files('GSE19435_RAW', pattern = '\\.txt.gz$',  full.names = TRUE),
  function(x) data.table::fread(file = x))
x <- do.call(cbind, x)
x[1:10,1:10]
          ID_REF        VALUE Detection PVal       ID_REF      VALUE
 1: ILMN_1673358  -1.97104200     0.60079050 ILMN_1673358 -2.9239160
 2: ILMN_1687840 106.76140000     0.00000000 ILMN_1687840 41.8389700
 3: ILMN_1762337  -1.40243500     0.55467720 ILMN_1762337 -9.5888790
 4: ILMN_2055271   0.01790681     0.42687750 ILMN_2055271 -0.4168698
 5: ILMN_1736007   6.08818900     0.11857710 ILMN_1736007 13.7985000
 6: ILMN_2383229  -1.05695300     0.51778660 ILMN_2383229 -6.0397830
 7: ILMN_1806310   0.69702590     0.37944660 ILMN_1806310 11.3878900
 8: ILMN_1779670   7.32470400     0.07905138 ILMN_1779670  2.9194760
 9: ILMN_2321282   3.26678900     0.21343870 ILMN_2321282 -2.5747600
10: ILMN_1671474   8.45006700     0.06587616 ILMN_1671474  8.9773250
    Detection PVal       ID_REF      VALUE Detection PVal       ID_REF
 1:     0.73254280 ILMN_1673358 -6.7453220    0.940711400 ILMN_1673358
 2:     0.00000000 ILMN_1687840 61.9178000    0.001317523 ILMN_1687840
 3:     0.98418970 ILMN_1762337 -5.3033740    0.876152800 ILMN_1762337
 4:     0.50461130 ILMN_2055271 11.6580100    0.027667980 ILMN_2055271
 5:     0.01054018 ILMN_1736007 -1.7626750    0.592885400 ILMN_1736007
 6:     0.91831360 ILMN_2383229 -4.9718470    0.860342600 ILMN_2383229
 7:     0.01844532 ILMN_1806310  3.4798970    0.175230600 ILMN_1806310
 8:     0.22397890 ILMN_1779670 -1.2728680    0.538866900 ILMN_1779670
 9:     0.70092230 ILMN_2321282 -0.9937009    0.507246400 ILMN_2321282
10:     0.03425560 ILMN_1671474  6.3072600    0.088274050 ILMN_1671474

You should then extract out the Detection PVal columns and save them for later, and also set the rownames of the object to be equal to ID_REF. The final x object should be just the expression levels, and it should be a data-matrix.

If you then read in the BGX file and align it to your x object (as annot), and also read in your targets file and align that to x (as targetinfo), you can then create a custom EListRaw object:

project <- new('EListRaw')
project@.Data[[1]] <- 'illumina'
project@.Data[[2]] <- targetinfo
project@.Data[[3]] <- annot
project@.Data[[4]] <- x
project@.Data[[5]] <- NULL
project$E <- x
project$targets <- targetinfo
project$genes <- annot
project$other$Detection <- detectionpvalues

Then:

Background correction / normalisation

project.bgcorrect.norm <- neqc(project, offset = 16)

filter out control probes, those with no symbol, and those that failed

Control <- project.bgcorrect.norm$genes$Source=="ILMN_Controls"
NoSymbol <- project.bgcorrect.norm$genes$Symbol == ""
isexpr <- rowSums(detectionpvalues <= 0.05) >= 3
project.bgcorrect.norm.filt <- project.bgcorrect.norm[!Control & !NoSymbol & isexpr, ]
dim(project.bgcorrect.norm)
dim(project.bgcorrect.norm.filt)

summarise across genes by mean

# ID is used to identify the replicates
project.bgcorrect.norm.filt.mean <- avereps(project.bgcorrect.norm.filt,
  ID = project.bgcorrect.norm.filt$genes$Symbol)
dim(project.bgcorrect.norm.filt.mean)

Kevin