This is not an automated workflow. - please take the code and adapt it to your own situation and review each and every command.
-----------
You can read them into your R environment manually, via read.table()
, read.csv()
, or something else. Unfortunately, for Illumina bead arrays, authors do not seem to typically upload the raw data idat files, meaning that neither the illuminaio package nor limma::read.idat()
in R can be used at the initial data-reading step.
For example, take a look, assuming that the .txt.gz files are in GSE19435_RAW/:
require(data.table)
x <- lapply(
list.files('GSE19435_RAW', pattern = '\\.txt.gz$', full.names = TRUE),
function(x) data.table::fread(file = x))
x <- do.call(cbind, x)
x[1:10,1:10]
ID_REF VALUE Detection PVal ID_REF VALUE
1: ILMN_1673358 -1.97104200 0.60079050 ILMN_1673358 -2.9239160
2: ILMN_1687840 106.76140000 0.00000000 ILMN_1687840 41.8389700
3: ILMN_1762337 -1.40243500 0.55467720 ILMN_1762337 -9.5888790
4: ILMN_2055271 0.01790681 0.42687750 ILMN_2055271 -0.4168698
5: ILMN_1736007 6.08818900 0.11857710 ILMN_1736007 13.7985000
6: ILMN_2383229 -1.05695300 0.51778660 ILMN_2383229 -6.0397830
7: ILMN_1806310 0.69702590 0.37944660 ILMN_1806310 11.3878900
8: ILMN_1779670 7.32470400 0.07905138 ILMN_1779670 2.9194760
9: ILMN_2321282 3.26678900 0.21343870 ILMN_2321282 -2.5747600
10: ILMN_1671474 8.45006700 0.06587616 ILMN_1671474 8.9773250
Detection PVal ID_REF VALUE Detection PVal ID_REF
1: 0.73254280 ILMN_1673358 -6.7453220 0.940711400 ILMN_1673358
2: 0.00000000 ILMN_1687840 61.9178000 0.001317523 ILMN_1687840
3: 0.98418970 ILMN_1762337 -5.3033740 0.876152800 ILMN_1762337
4: 0.50461130 ILMN_2055271 11.6580100 0.027667980 ILMN_2055271
5: 0.01054018 ILMN_1736007 -1.7626750 0.592885400 ILMN_1736007
6: 0.91831360 ILMN_2383229 -4.9718470 0.860342600 ILMN_2383229
7: 0.01844532 ILMN_1806310 3.4798970 0.175230600 ILMN_1806310
8: 0.22397890 ILMN_1779670 -1.2728680 0.538866900 ILMN_1779670
9: 0.70092230 ILMN_2321282 -0.9937009 0.507246400 ILMN_2321282
10: 0.03425560 ILMN_1671474 6.3072600 0.088274050 ILMN_1671474
You should then extract out the Detection PVal columns and save them for later, and also set the rownames of the object to be equal to ID_REF
. The final x
object should be just the expression levels, and it should be a data-matrix.
If you then read in the BGX file and align it to your x
object (as annot
), and also read in your targets file and align that to x
(as targetinfo
), you can then create a custom EListRaw object:
project <- new('EListRaw')
project@.Data[[1]] <- 'illumina'
project@.Data[[2]] <- targetinfo
project@.Data[[3]] <- annot
project@.Data[[4]] <- x
project@.Data[[5]] <- NULL
project$E <- x
project$targets <- targetinfo
project$genes <- annot
project$other$Detection <- detectionpvalues
Then:
Background correction / normalisation
project.bgcorrect.norm <- neqc(project, offset = 16)
filter out control probes, those with no symbol, and those that failed
Control <- project.bgcorrect.norm$genes$Source=="ILMN_Controls"
NoSymbol <- project.bgcorrect.norm$genes$Symbol == ""
isexpr <- rowSums(detectionpvalues <= 0.05) >= 3
project.bgcorrect.norm.filt <- project.bgcorrect.norm[!Control & !NoSymbol & isexpr, ]
dim(project.bgcorrect.norm)
dim(project.bgcorrect.norm.filt)
summarise across genes by mean
# ID is used to identify the replicates
project.bgcorrect.norm.filt.mean <- avereps(project.bgcorrect.norm.filt,
ID = project.bgcorrect.norm.filt$genes$Symbol)
dim(project.bgcorrect.norm.filt.mean)
Kevin
Thank you very much Kevin for the support which is a blessing right now. I have one other question related to this I did the first part of combining as you said but for the next part I can find the .bgx file in the raw tar folder but not this targets file which I need to convert into targetinfo. Can you tell me more about it please. Thanks again.
Kevin does this targets file happens to be the pdata? phenotype data associated with arrays?I appreciate your response.
Yes and no. The targets file should be ordered as per the columns of your data matrix of expression values. It usually contains the conditions that you want to test in a differential expression analysis. It can be regarded as phenotype data.
got it thanks!you saved a lot of time. Appreciate it.