illumina Arrays Illumina HumanHT-12 V3.0 expression beadchip reading data
1
3
Entering edit mode
4.5 years ago
ASid ▴ 40

Hi All

So I have this GEO illumina data that i need to analyze with GEO ID GSE19435. This is a Illumina HumanHT-12 V3.0 expression beadchip . I want to analyze the non normalized data which is usually provided as non_normalized separate series file with this name for example you can find one in GSE42825.

I have analyzed previously Illumina HumanHT-12 V4.0 expression beadchip and when I want to download their raw data it comes as a single txt file /.txt.gz non normalized attached in supplementary data in GEO with specifically in the name mentioned that it is raw or non normalized data.

But for this GSE19435 (which is Illumina HumanHT-12 V3.0 ) when I download the raw data , it gives me a single file for every sample individually. I could verify that the data is definitely raw and non normalized but I have no clue which function allows me to let it read in a combined way all of these files as a single expression data file.

Thanks for your help in advance. I will appreciate if you could give me the exact code.

microarray Illumina • 4.4k views
ADD COMMENT
3
Entering edit mode
4.5 years ago

This is not an automated workflow. - please take the code and adapt it to your own situation and review each and every command.

-----------

You can read them into your R environment manually, via read.table(), read.csv(), or something else. Unfortunately, for Illumina bead arrays, authors do not seem to typically upload the raw data idat files, meaning that neither the illuminaio package nor limma::read.idat() in R can be used at the initial data-reading step.

For example, take a look, assuming that the .txt.gz files are in GSE19435_RAW/:

require(data.table)
x <- lapply(
  list.files('GSE19435_RAW', pattern = '\\.txt.gz$',  full.names = TRUE),
  function(x) data.table::fread(file = x))
x <- do.call(cbind, x)
x[1:10,1:10]
          ID_REF        VALUE Detection PVal       ID_REF      VALUE
 1: ILMN_1673358  -1.97104200     0.60079050 ILMN_1673358 -2.9239160
 2: ILMN_1687840 106.76140000     0.00000000 ILMN_1687840 41.8389700
 3: ILMN_1762337  -1.40243500     0.55467720 ILMN_1762337 -9.5888790
 4: ILMN_2055271   0.01790681     0.42687750 ILMN_2055271 -0.4168698
 5: ILMN_1736007   6.08818900     0.11857710 ILMN_1736007 13.7985000
 6: ILMN_2383229  -1.05695300     0.51778660 ILMN_2383229 -6.0397830
 7: ILMN_1806310   0.69702590     0.37944660 ILMN_1806310 11.3878900
 8: ILMN_1779670   7.32470400     0.07905138 ILMN_1779670  2.9194760
 9: ILMN_2321282   3.26678900     0.21343870 ILMN_2321282 -2.5747600
10: ILMN_1671474   8.45006700     0.06587616 ILMN_1671474  8.9773250
    Detection PVal       ID_REF      VALUE Detection PVal       ID_REF
 1:     0.73254280 ILMN_1673358 -6.7453220    0.940711400 ILMN_1673358
 2:     0.00000000 ILMN_1687840 61.9178000    0.001317523 ILMN_1687840
 3:     0.98418970 ILMN_1762337 -5.3033740    0.876152800 ILMN_1762337
 4:     0.50461130 ILMN_2055271 11.6580100    0.027667980 ILMN_2055271
 5:     0.01054018 ILMN_1736007 -1.7626750    0.592885400 ILMN_1736007
 6:     0.91831360 ILMN_2383229 -4.9718470    0.860342600 ILMN_2383229
 7:     0.01844532 ILMN_1806310  3.4798970    0.175230600 ILMN_1806310
 8:     0.22397890 ILMN_1779670 -1.2728680    0.538866900 ILMN_1779670
 9:     0.70092230 ILMN_2321282 -0.9937009    0.507246400 ILMN_2321282
10:     0.03425560 ILMN_1671474  6.3072600    0.088274050 ILMN_1671474

You should then extract out the Detection PVal columns and save them for later, and also set the rownames of the object to be equal to ID_REF. The final x object should be just the expression levels, and it should be a data-matrix.

If you then read in the BGX file and align it to your x object (as annot), and also read in your targets file and align that to x (as targetinfo), you can then create a custom EListRaw object:

project <- new('EListRaw')
project@.Data[[1]] <- 'illumina'
project@.Data[[2]] <- targetinfo
project@.Data[[3]] <- annot
project@.Data[[4]] <- x
project@.Data[[5]] <- NULL
project$E <- x
project$targets <- targetinfo
project$genes <- annot
project$other$Detection <- detectionpvalues

Then:

Background correction / normalisation

project.bgcorrect.norm <- neqc(project, offset = 16)

filter out control probes, those with no symbol, and those that failed

Control <- project.bgcorrect.norm$genes$Source=="ILMN_Controls"
NoSymbol <- project.bgcorrect.norm$genes$Symbol == ""
isexpr <- rowSums(detectionpvalues <= 0.05) >= 3
project.bgcorrect.norm.filt <- project.bgcorrect.norm[!Control & !NoSymbol & isexpr, ]
dim(project.bgcorrect.norm)
dim(project.bgcorrect.norm.filt)

summarise across genes by mean

# ID is used to identify the replicates
project.bgcorrect.norm.filt.mean <- avereps(project.bgcorrect.norm.filt,
  ID = project.bgcorrect.norm.filt$genes$Symbol)
dim(project.bgcorrect.norm.filt.mean)

Kevin

ADD COMMENT
0
Entering edit mode

Thank you very much Kevin for the support which is a blessing right now. I have one other question related to this I did the first part of combining as you said but for the next part I can find the .bgx file in the raw tar folder but not this targets file which I need to convert into targetinfo. Can you tell me more about it please. Thanks again.

ADD REPLY
0
Entering edit mode

Kevin does this targets file happens to be the pdata? phenotype data associated with arrays?I appreciate your response.

ADD REPLY
1
Entering edit mode

Yes and no. The targets file should be ordered as per the columns of your data matrix of expression values. It usually contains the conditions that you want to test in a differential expression analysis. It can be regarded as phenotype data.

ADD REPLY
1
Entering edit mode

got it thanks!you saved a lot of time. Appreciate it.

ADD REPLY

Login before adding your answer.

Traffic: 2378 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6