Sample code for GSE79404
#!/usr/bin/Rscript
# general config
baseDir <- 'GSE79404/'
bgxfile <- 'Annot/HumanHT-12_V4_0_R2_15002873_B.bgx'
targetsfile <- 'targets.txt'
setwd(baseDir)
options(scipen = 99)
require(limma)
require(RColorBrewer)
require(PCAtools)
# read in the data and convert the data to an EListRaw object, which is a data object for single channel data
x <- read.table(paste0(baseDir, 'raw/GSE79404_non-normalized.txt'),
header = TRUE, sep = '\t', stringsAsFactors = FALSE, skip = 4)
newnames <- colnames(read.table(paste0(baseDir, 'raw/GSE79404_non-normalized.txt'),
header = TRUE, sep = '\t', stringsAsFactors = FALSE, skip = 3))
detectionpvalues <- x[,grep('Detection.Pval', colnames(x))]
x <- x[,-grep('Detection.Pval', colnames(x))]
probes <- x$ID_REF
x <- data.matrix(x[,2:ncol(x)])
rownames(x) <- probes
colnames(x) <- gsub('^X', '', colnames(x))
colnames(x) <- newnames[grep('GSM', newnames)]
# read in annotation
annot <- illuminaio::readBGX(bgxfile)$probes
annot <- annot[,which(colnames(annot) %in% c('Source','Symbol','Transcript','ILMN_Gene','RefSeq_ID',
'Entrez_Gene_ID','Symbol','Protein_Product','Probe_Id','Probe_Type',
'Probe_Start','Chromosome','Probe_Chr_Orientation','Probe_Coordinates',
'Cytoband', 'Definition', 'Ontology_Component', 'Ontology_Process',
'Ontology_Function', 'Synonyms'))]
annot <- annot[which(annot$Probe_Id %in% rownames(x)),]
annot <- annot[match(rownames(x), annot$Probe_Id),]
# update the target info
targetinfo <- readTargets(targetsfile, sep = '\t')
rownames(targetinfo) <- targetinfo$IDATfile
x <- x[,match(rownames(targetinfo), colnames(x))]
if (!all(colnames(x) == rownames(targetinfo)))
stop('Target info is not aligned to expression data - they must be in the same order')
# create a custom EListRaw object
project <- new('EListRaw')
project@.Data[[1]] <- 'illumina'
project@.Data[[2]] <- targetinfo
#project@.Data[[3]] <- annot
project@.Data[[3]] <- NULL
project@.Data[[4]] <- x
project@.Data[[5]] <- NULL
project$E <- x
project$targets <- targetinfo
#project$genes <- annot
project$genes <- NULL
project$other$Detection <- detectionpvalues
# generate QC plots of raw intensities
... ...
# for BeadArrays, background correction and normalisation are handled by a single function: neqc()
# this is the same as per Agilent single colour arrays
#
# perform background correction on the fluorescent intensities
# 'normexp' is beneficial in that it doesn't result in negative values, meaning no data is lost
# for ideal offset, see Gordon Smyth's answer, here: https://stat.ethz.ch/pipermail/bioconductor/2006-April/012554.html
# normalize the data with the 'quantile' method, to be consistent with RMA for Affymetrix arrays
project.bgcorrect.norm <- neqc(project, offset = 16)
# filter out control probes, those with no symbol, and those that failed
annot <- annot[which(annot$Probe_Id %in% rownames(project.bgcorrect.norm)),]
project.bgcorrect.norm <- project.bgcorrect.norm[which(rownames(project.bgcorrect.norm) %in% annot$Probe_Id),]
annot <- annot[match(rownames(project.bgcorrect.norm), annot$Probe_Id),]
project.bgcorrect.norm@.Data[[3]] <- annot
project.bgcorrect.norm$genes <- annot
Control <- project.bgcorrect.norm$genes$Source=="ILMN_Controls"
NoSymbol <- project.bgcorrect.norm$genes$Symbol == ""
isexpr <- rowSums(project.bgcorrect.norm$other$Detection <= 0.05) >= 3
project.bgcorrect.norm.filt <- project.bgcorrect.norm[!Control & !NoSymbol & isexpr, ]
dim(project.bgcorrect.norm)
dim(project.bgcorrect.norm.filt)
# remove annotation columns we no longer need
project.bgcorrect.norm.filt$genes <- project.bgcorrect.norm.filt$genes[,c(
'Probe_Id',
'Definition','Ontology_Component','Ontology_Process','Ontology_Function',
'Chromosome','Probe_Coordinates','Cytoband','Probe_Chr_Orientation',
'RefSeq_ID','Entrez_Gene_ID','Symbol')]
head(project.bgcorrect.norm.filt$genes)
# summarise across genes by mean
# ID is used to identify the replicates
project.bgcorrect.norm.filt.mean <- avereps(project.bgcorrect.norm.filt,
ID = project.bgcorrect.norm.filt$genes$Symbol)
dim(project.bgcorrect.norm.filt.mean)
targets.txt should look like:
IDATfile Group
GSM2094831 Healthy
GSM2094832 Healthy
GSM2094833 Disease
GSM2094834 Healthy
... ...
Hey Kevin! I follow the instructions from your workflow, and it's going well! But, when i tried to execute bc correct and norm, R give me this warning messages:
Oh, fala português? Cadê o código que esta tentando de executar? Pode responder em inglês
Hey Kevin, yes! I am a brazilian student! i found where i losted my mind!
now, i set ID_REF as rownames, and created an ElisRaw. Now, i proceed with bccorrection and normalisation.But R provides me this warning: Note: inferring mean and variance of negative control probe intensities from the detection p-values.
how to proceed now?
one more question: What is offset, and why 16 ? :)
Obrigado, mas qual linea especificamente produz o erro?
Hey Kevin!
I'm still confused about offset=16 (sorry about this)
When i try to proceed with # filter out control probes, those with no symbol, and those that failed# I put:
And R give me this warning: Error in object[[a]][i, , drop = FALSE] : número incorreto de dimensões
In this case, what is my fail ?
for my analysis i need this stages ? - filter out control probes, those with no symbol, and those that failed - summarise across genes by mean
You may not need these stages / steps.
Before executing (running) these lines of code, you should have already run:
What is the output of
?
The
offset
is added to the expression values after the background correction so that no negative values appear in the data.when i run :
r provide's me a new elist, and this warning : Note: inferring mean and variance of negative control probe intensities from the detection p-values.
about the second question:
what data i should have in:
Control <- raw_data.bgcorrect.norm$genes$controls=="ILMN_Controls"
NoSymbol <- raw_data.bgcorrect.norm$genes$probes == " " ?
There is evidently a problem with your annotation object,
raw_data.bgcorrect.norm$genes
.From the other thread, did you perform this part?
?
Your annotation object has to be 100% / perfectly aligned with your expression data.
--------------
If you want, you can avoid the filtering steps involving
Control
,NoSymbol
, andisexpr
. The Note that appears after you runneqc()
indicates that the normalisation process is already utilising the detection p-values; so, this is good news.16 is a [somewhat] random / aleatório number, but that recommended by Professor Gordon Smyth.
Please keep in mind that you can avoid this advanced procedure by obtaining the normalised data automatically via GEO2R:
If you are a beginner in this, there is no reason why you need to go through this advanced process.
I'm afraid not... to import the BGX data, I used: library(illuminaio)
After, I just proceed with your workflow:
I really now that, but I am a very insterest ... so I whant to learning sorry if I'm boring you : (
Basically, the annotation always has to be perfectly aligned to the expression data based on the probe IDs. This is even more important in this case because this is a very advanced procedure, but you indicated that you are a beginner with no local help (?).
You can align objects via match(), which(), or via dplyr: https://www.infoworld.com/article/3454356/how-to-merge-data-in-r-using-r-merge-dplyr-or-datatable.html
As you may understand, for different reasons, I have hesitancy to just provide all code to you.
There's no problem with this... I will read this article and trying to understand . Unfortunately I just have a few people help me( also trying understand ). Thanks for help me
Any luck?
Hey Kevin! I found some articles and I'm watching some classes (videos) about.
I hope return to talk with you soon :)
Hey Kevin! Sorry for my late response!
Finally i used the merge function , to align the probes with the expression matrix (as annot). When I try to merge the controls with the expression matrix , R provides me: Error in merge.data.frame(raw_int, controls, by.x = "Probe_Id", all.x = TRUE, : 'by.x' e 'by.y' especificam números de colunas diferentes I think is because the expression matrix doesn't have the controls probe ID.
So I go on with background correction / normalisation and only (expression matrix) has normalized values, the others don't . Is this correct ?
Olá! Não sei o que está acontecendo, mas, sim, é possivél que é relacionado com os probes de controle. Eu coloquei código em uma nova resposta. Pode tentar de usar ese?
Sim! Tento agora e te retorno!