Question: gene filtering for agilent microarray using Limma
1
gravatar for rohit
4.9 years ago by
rohit30
India
rohit30 wrote:

Hello everyone,

I'm analysing agilent microarray data using limma package. I want to do gene filter to remove the probes that do not give any expression values. Is there any workflow to filter genes in limma package?

Thank you in advance.

ADD COMMENTlink modified 22 months ago by Kevin Blighe61k • written 4.9 years ago by rohit30

Hi Rohit,

Please, how you did do to filter, I ask the same question?

Thanks for help,

ADD REPLYlink written 3.7 years ago by Bouzid.a0

Hi Rohit, Please, how you did do to filter, I ask the same question? Thanks for help,

ADD REPLYlink written 3.7 years ago by Bouzid.a0

sad, nothing improvement?

ADD REPLYlink written 22 months ago by BioLite20
1
gravatar for Kevin Blighe
22 months ago by
Kevin Blighe61k
University College London
Kevin Blighe61k wrote:

In microarrays, expression is measured as fluorescence intensities from probes that hybridise to their target cDNA sequences. Virtually all probes will fluoresce and return values.

Genes that are not expressed may fall into the background noise (or be slightly above it), the level for which is then corrected during RMA normalisation, the typical normalisation strategy. Background noise can be inferred from, for example, the expression gained from negative control probes. In RNA-seq, conversely, genes that are not expressed will have 0 count values.

You should ideally not do any filtering prior to running limma. Only probes that have been shown to repeatedly fail should be removed. Probes covering genes that you definitively know to not be expressed in your tissue of study could also be removed. It can be inferred that, after normalisation, those genes with the lowest values are reflective of genes that have virtually nil expression.

Kevin

ADD COMMENTlink modified 22 months ago • written 22 months ago by Kevin Blighe61k

OH! So happy! Kevin answered this question! I am a newbie and have been concerned about Agilent single channel microarray analysis question for a long time. But I still bothered by some questions. All remarks on Biostar show Kevin you are always kindful and helpful, may I borrow some your precious time for informing me something important but I ignored? Thanks from my heart! I am trying to replicate the Agilent part of this article:https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002794. Followed by its referred package Limma and Agi4x44PreProcess, I couldn't reach the article's results. I get a fault probe filter number and also get an error mention in the after analysis, my code as following:

library(limma)
#loading data
targets<-read.delim("E-MTAB-812.sdrf.txt")
dd<-targets[,c("Array.Data.File","Factor.Value.DISEASE_STATE.","Characteristics.RNA.Integrity.Number.")]
raw_data<-read.maimages(targets[,"Array.Data.File"],source = "agilent",green.only = TRUE,other.columns =c("gIsWellAboveBG","gIsSaturated","gIsFeatPopnOL","gIsFeatNonUnifOL") )

#Subtract the background
bg_data<-backgroundCorrect(raw_data,method = "normexp")
#Normalize between the arrays
nm_data<-normalizeBetweenArrays(bg_data,method = "quantile")
nm_data_avg<-avereps(nm_data,ID=nm_data$genes$ProbeName)
dim(nm_data)

#QC report
eSet<-new("ExpressionSet", exprs = nm_data$E,annotation=nm_data$genes$GeneName)
QC_report_nor<-arrayQualityMetrics(eSet)

#probe filter
Control<-nm_data$genes$ControlType==1
IsExpr<- rowSums(nm_data$other$gIsWellAboveBG>0) <=40
fil_data1<-nm_data[!Control&!IsExpr,]
fil_data2<-avereps(fil_data1,ID=fil_data1$genes$ProbeName)
dim(fil_data2)

#gene annotation
library(clusterProfiler)
library(hgug4112a.db)
tr_gene<-fil_data2$genes$ProbeName
Symbol<-bitr(tr_gene,fromType = "PROBEID",toType = "SYMBOL",OrgDb = hgug4112a.db,drop = FALSE)
EntrezID<-bitr(tr_gene,fromType = "PROBEID",toType = "ENTREZID",OrgDb = hgug4112a.db,drop = FALSE)
fil_data2$genes<-cbind(fil_data2$genes,Symbol,EntrezID)
fil_data2$genes<-fil_data2$genes[,c("PROBEID","SYMBOL","ENTREZID")]

##remove NA
no_symbol<-is.na(fil_data2$genes$SYMBOL)
no_entr<-is.na(fil_data2$genes$ENTREZID)
fil_data_rm<-fil_data2[!no_symbol&!no_entr,]

#DEG filter
rownames(fil_data_rm$E)<-fil_data_rm$genes$SYMBOL
fil_data_sy<-fil_data_rm$E
treatment<-targets[,"Factor.Value.DISEASE_STATE."]
TR_levels<-c("neurologically healthy control","Parkinson disease")
Group<-factor(treatment,levels = TR_levels)
design2<- model.matrix(~Group+0)
colnames(design2) <- c("Control","PD")

fita<- lmFit(fil_data_rm$E, design2)
cont.matrix <- makeContrasts(Diff=PD-Control, levels=design2)##!!!colnames could not too long -R can't identify
fitb<- contrasts.fit(fita, cont.matrix)
fitc<- eBayes(fitb)

Result_A<-topTable(fitc,adjust="BH",coef = "Diff",genelist =fil_data_rm$genes$SYMBOL ,number = Inf,p.value = 0.05)
summary(decideTests(fitc),lfc = 1.5)
de_gen<-decideTests(fitc,lfc =1.5)

Thanks for your precious time again!

ADD REPLYlink modified 22 months ago by zx87549.4k • written 22 months ago by BioLite20

Hey, thank you for the comments. I try my best - that's all.

The lack of reproducibility of experiments is a very common problem in research. The study that you are using was published 6 years ago, so, new versions of limma and other packages have been released since then. This means that, starting from the raw data, you are highly unlikely to obtain the exact same values / results. However, generally, the interpretation of the results should be the same.

Some things for you:

  • did you try to start from the Processed data that the authors used? THis will contain the normalised data originally produced by the authors.
  • did you closely follow the methods in the manuscript?

In relation to the methods, I see key phrases like:

  • "Nine outlier samples were detected based on the arrayQualityMetrics default criteria and were dropped from further analyses"
  • "Microarray probes were removed if they had expression values outside the detectable spike-in range in more than 50% of the control arrays and more than 50% of the PD arrays, or if they had any of the Agilent flags IsWellAboveBG = 0, gIsSaturated = 1, gIsFeatPopnOL = 1, gIsFeatNonUnifOL = 1 in more than 75% of the arrays."
  • "The median expression value was used for replicated probes that passed the above filtering criteria. A total of 39,122 probes out of the total 45,015 probes present on the microarray chips were analyzed in the expression and eSNP studies. The expression data for the retained probes of the 53 arrays (E-MTAB-812 ArrayExpress dataset) were quantile normalized, and the obtained values were base 2 logarithm transformed."

Looking at your code, I am not sure that you did pre-filtering? They appear to have removed a lot of probes that were assumed to have failed, which is a practice performed in microarray analyses (as I mention above).

ADD REPLYlink modified 22 months ago • written 22 months ago by Kevin Blighe61k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1566 users visited in the last hour