gene filtering for agilent microarray using Limma
1
4
Entering edit mode
6.8 years ago
rohit ▴ 60

Hello everyone,

I'm analysing agilent microarray data using limma package. I want to do gene filter to remove the probes that do not give any expression values. Is there any workflow to filter genes in limma package?

agilent microarray limma gene filtering R • 4.5k views
0
Entering edit mode

Hi Rohit,

Please, how you did do to filter, I ask the same question?

Thanks for help,

0
Entering edit mode

Hi Rohit, Please, how you did do to filter, I ask the same question? Thanks for help,

0
Entering edit mode

4
Entering edit mode
3.7 years ago

In microarrays, expression is measured as fluorescence intensities from probes that hybridise to their target cDNA sequences. Virtually all probes will fluoresce and return values.

Genes that are not expressed may fall into the background noise (or be slightly above it), the level for which is then corrected during RMA normalisation, the typical normalisation strategy. Background noise can be inferred from, for example, the expression gained from negative control probes. In RNA-seq, conversely, genes that are not expressed will have 0 count values.

You should ideally not do any filtering prior to running limma. Only probes that have been shown to repeatedly fail should be removed. Probes covering genes that you definitively know to not be expressed in your tissue of study could also be removed. It can be inferred that, after normalisation, those genes with the lowest values are reflective of genes that have virtually nil expression.

Kevin

0
Entering edit mode

OH! So happy! Kevin answered this question! I am a newbie and have been concerned about Agilent single channel microarray analysis question for a long time. But I still bothered by some questions. All remarks on Biostar show Kevin you are always kindful and helpful, may I borrow some your precious time for informing me something important but I ignored? Thanks from my heart! I am trying to replicate the Agilent part of this article:https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002794. Followed by its referred package Limma and Agi4x44PreProcess, I couldn't reach the article's results. I get a fault probe filter number and also get an error mention in the after analysis, my code as following:

library(limma)
dd<-targets[,c("Array.Data.File","Factor.Value.DISEASE_STATE.","Characteristics.RNA.Integrity.Number.")]
raw_data<-read.maimages(targets[,"Array.Data.File"],source = "agilent",green.only = TRUE,other.columns =c("gIsWellAboveBG","gIsSaturated","gIsFeatPopnOL","gIsFeatNonUnifOL") )

#Subtract the background
bg_data<-backgroundCorrect(raw_data,method = "normexp")
#Normalize between the arrays
nm_data<-normalizeBetweenArrays(bg_data,method = "quantile")
nm_data_avg<-avereps(nm_data,ID=nm_data$genes$ProbeName)
dim(nm_data)

#QC report
eSet<-new("ExpressionSet", exprs = nm_data$E,annotation=nm_data$genes$GeneName) QC_report_nor<-arrayQualityMetrics(eSet) #probe filter Control<-nm_data$genes$ControlType==1 IsExpr<- rowSums(nm_data$other$gIsWellAboveBG>0) <=40 fil_data1<-nm_data[!Control&!IsExpr,] fil_data2<-avereps(fil_data1,ID=fil_data1$genes$ProbeName) dim(fil_data2) #gene annotation library(clusterProfiler) library(hgug4112a.db) tr_gene<-fil_data2$genes$ProbeName Symbol<-bitr(tr_gene,fromType = "PROBEID",toType = "SYMBOL",OrgDb = hgug4112a.db,drop = FALSE) EntrezID<-bitr(tr_gene,fromType = "PROBEID",toType = "ENTREZID",OrgDb = hgug4112a.db,drop = FALSE) fil_data2$genes<-cbind(fil_data2$genes,Symbol,EntrezID) fil_data2$genes<-fil_data2$genes[,c("PROBEID","SYMBOL","ENTREZID")] ##remove NA no_symbol<-is.na(fil_data2$genes$SYMBOL) no_entr<-is.na(fil_data2$genes$ENTREZID) fil_data_rm<-fil_data2[!no_symbol&!no_entr,] #DEG filter rownames(fil_data_rm$E)<-fil_data_rm$genes$SYMBOL
fil_data_sy<-fil_data_rm$E treatment<-targets[,"Factor.Value.DISEASE_STATE."] TR_levels<-c("neurologically healthy control","Parkinson disease") Group<-factor(treatment,levels = TR_levels) design2<- model.matrix(~Group+0) colnames(design2) <- c("Control","PD") fita<- lmFit(fil_data_rm$E, design2)
cont.matrix <- makeContrasts(Diff=PD-Control, levels=design2)##!!!colnames could not too long -R can't identify
fitb<- contrasts.fit(fita, cont.matrix)
fitc<- eBayes(fitb)

Result_A<-topTable(fitc,adjust="BH",coef = "Diff",genelist =fil_data_rm$genes$SYMBOL ,number = Inf,p.value = 0.05)
summary(decideTests(fitc),lfc = 1.5)
de_gen<-decideTests(fitc,lfc =1.5)


Thanks for your precious time again!

0
Entering edit mode

Hey, thank you for the comments. I try my best - that's all.

The lack of reproducibility of experiments is a very common problem in research. The study that you are using was published 6 years ago, so, new versions of limma and other packages have been released since then. This means that, starting from the raw data, you are highly unlikely to obtain the exact same values / results. However, generally, the interpretation of the results should be the same.

Some things for you:

• did you try to start from the Processed data that the authors used? THis will contain the normalised data originally produced by the authors.
• did you closely follow the methods in the manuscript?

In relation to the methods, I see key phrases like:

• "Nine outlier samples were detected based on the arrayQualityMetrics default criteria and were dropped from further analyses"
• "Microarray probes were removed if they had expression values outside the detectable spike-in range in more than 50% of the control arrays and more than 50% of the PD arrays, or if they had any of the Agilent flags IsWellAboveBG = 0, gIsSaturated = 1, gIsFeatPopnOL = 1, gIsFeatNonUnifOL = 1 in more than 75% of the arrays."
• "The median expression value was used for replicated probes that passed the above filtering criteria. A total of 39,122 probes out of the total 45,015 probes present on the microarray chips were analyzed in the expression and eSNP studies. The expression data for the retained probes of the 53 arrays (E-MTAB-812 ArrayExpress dataset) were quantile normalized, and the obtained values were base 2 logarithm transformed."

Looking at your code, I am not sure that you did pre-filtering? They appear to have removed a lot of probes that were assumed to have failed, which is a practice performed in microarray analyses (as I mention above).