Sample code for GSE79404
# general config
baseDir <- 'GSE79404/'
bgxfile <- 'Annot/HumanHT-12_V4_0_R2_15002873_B.bgx'
targetsfile <- 'targets.txt'
options(scipen = 99)
# read in the data and convert the data to an EListRaw object, which is a data object for single channel data
x <- read.table(paste0(baseDir, 'raw/GSE79404_non-normalized.txt'),
header = TRUE, sep = '\t', stringsAsFactors = FALSE, skip = 4)
newnames <- colnames(read.table(paste0(baseDir, 'raw/GSE79404_non-normalized.txt'),
header = TRUE, sep = '\t', stringsAsFactors = FALSE, skip = 3))
detectionpvalues <- x[,grep('Detection.Pval', colnames(x))]
x <- x[,-grep('Detection.Pval', colnames(x))]
probes <- x$ID_REF
x <- data.matrix(x[,2:ncol(x)])
rownames(x) <- probes
colnames(x) <- gsub('^X', '', colnames(x))
colnames(x) <- newnames[grep('GSM', newnames)]
# read in annotation
annot <- illuminaio::readBGX(bgxfile)$probes
annot <- annot[,which(colnames(annot) %in% c('Source','Symbol','Transcript','ILMN_Gene','RefSeq_ID',
'Cytoband', 'Definition', 'Ontology_Component', 'Ontology_Process',
annot <- annot[which(annot$Probe_Id %in% rownames(x)),]
annot <- annot[match(rownames(x), annot$Probe_Id),]
# update the target info
targetinfo <- readTargets(targetsfile, sep = '\t')
rownames(targetinfo) <- targetinfo$IDATfile
x <- x[,match(rownames(targetinfo), colnames(x))]
if (!all(colnames(x) == rownames(targetinfo)))
stop('Target info is not aligned to expression data - they must be in the same order')
# create a custom EListRaw object
project <- new('EListRaw')
project@.Data[] <- 'illumina'
project@.Data[] <- targetinfo
#project@.Data[] <- annot
project@.Data[] <- NULL
project@.Data[] <- x
project@.Data[] <- NULL
project$E <- x
project$targets <- targetinfo
#project$genes <- annot
project$genes <- NULL
project$other$Detection <- detectionpvalues
# generate QC plots of raw intensities
# for BeadArrays, background correction and normalisation are handled by a single function: neqc()
# this is the same as per Agilent single colour arrays
# perform background correction on the fluorescent intensities
# 'normexp' is beneficial in that it doesn't result in negative values, meaning no data is lost
# for ideal offset, see Gordon Smyth's answer, here: https://stat.ethz.ch/pipermail/bioconductor/2006-April/012554.html
# normalize the data with the 'quantile' method, to be consistent with RMA for Affymetrix arrays
project.bgcorrect.norm <- neqc(project, offset = 16)
# filter out control probes, those with no symbol, and those that failed
annot <- annot[which(annot$Probe_Id %in% rownames(project.bgcorrect.norm)),]
project.bgcorrect.norm <- project.bgcorrect.norm[which(rownames(project.bgcorrect.norm) %in% annot$Probe_Id),]
annot <- annot[match(rownames(project.bgcorrect.norm), annot$Probe_Id),]
project.bgcorrect.norm@.Data[] <- annot
project.bgcorrect.norm$genes <- annot
Control <- project.bgcorrect.norm$genes$Source=="ILMN_Controls"
NoSymbol <- project.bgcorrect.norm$genes$Symbol == ""
isexpr <- rowSums(project.bgcorrect.norm$other$Detection <= 0.05) >= 3
project.bgcorrect.norm.filt <- project.bgcorrect.norm[!Control & !NoSymbol & isexpr, ]
# remove annotation columns we no longer need
project.bgcorrect.norm.filt$genes <- project.bgcorrect.norm.filt$genes[,c(
# summarise across genes by mean
# ID is used to identify the replicates
project.bgcorrect.norm.filt.mean <- avereps(project.bgcorrect.norm.filt,
ID = project.bgcorrect.norm.filt$genes$Symbol)
targets.txt should look like:
Hey Kevin! I follow the instructions from your workflow, and it's going well! But, when i tried to execute bc correct and norm, R give me this warning messages:
Oh, fala português? Cadê o código que esta tentando de executar? Pode responder em inglês
Hey Kevin, yes! I am a brazilian student! i found where i losted my mind!
now, i set ID_REF as rownames, and created an ElisRaw. Now, i proceed with bccorrection and normalisation.But R provides me this warning: Note: inferring mean and variance of negative control probe intensities from the detection p-values.
how to proceed now?
one more question: What is offset, and why 16 ? :)
Obrigado, mas qual linea especificamente produz o erro?
I'm still confused about offset=16 (sorry about this)
When i try to proceed with # filter out control probes, those with no symbol, and those that failed# I put:
And R give me this warning: Error in object[[a]][i, , drop = FALSE] : número incorreto de dimensões
In this case, what is my fail ?
for my analysis i need this stages ? - filter out control probes, those with no symbol, and those that failed - summarise across genes by mean
You may not need these stages / steps.
Before executing (running) these lines of code, you should have already run:
What is the output of
offsetis added to the expression values after the background correction so that no negative values appear in the data.
when i run :
r provide's me a new elist, and this warning : Note: inferring mean and variance of negative control probe intensities from the detection p-values.
about the second question:
what data i should have in:
Control <- raw_data.bgcorrect.norm$genes$controls=="ILMN_Controls"
NoSymbol <- raw_data.bgcorrect.norm$genes$probes == " " ?
There is evidently a problem with your annotation object,
From the other thread, did you perform this part?
Your annotation object has to be 100% / perfectly aligned with your expression data.
If you want, you can avoid the filtering steps involving
isexpr. The Note that appears after you run
neqc()indicates that the normalisation process is already utilising the detection p-values; so, this is good news.
16 is a [somewhat] random / aleatório number, but that recommended by Professor Gordon Smyth.
Please keep in mind that you can avoid this advanced procedure by obtaining the normalised data automatically via GEO2R:
If you are a beginner in this, there is no reason why you need to go through this advanced process.
I'm afraid not... to import the BGX data, I used: library(illuminaio)
After, I just proceed with your workflow:
I really now that, but I am a very insterest ... so I whant to learning sorry if I'm boring you : (
Basically, the annotation always has to be perfectly aligned to the expression data based on the probe IDs. This is even more important in this case because this is a very advanced procedure, but you indicated that you are a beginner with no local help (?).
You can align objects via match(), which(), or via dplyr: https://www.infoworld.com/article/3454356/how-to-merge-data-in-r-using-r-merge-dplyr-or-datatable.html
As you may understand, for different reasons, I have hesitancy to just provide all code to you.
There's no problem with this... I will read this article and trying to understand . Unfortunately I just have a few people help me( also trying understand ). Thanks for help me
Hey Kevin! I found some articles and I'm watching some classes (videos) about.
I hope return to talk with you soon :)
Hey Kevin! Sorry for my late response!
Finally i used the merge function , to align the probes with the expression matrix (as annot). When I try to merge the controls with the expression matrix , R provides me: Error in merge.data.frame(raw_int, controls, by.x = "Probe_Id", all.x = TRUE, : 'by.x' e 'by.y' especificam números de colunas diferentes I think is because the expression matrix doesn't have the controls probe ID.
So I go on with background correction / normalisation and only (expression matrix) has normalized values, the others don't . Is this correct ?
Olá! Não sei o que está acontecendo, mas, sim, é possivél que é relacionado com os probes de controle. Eu coloquei código em uma nova resposta. Pode tentar de usar ese?
Sim! Tento agora e te retorno!