Question: nsFilter() almost always return the same number of filtered genes in different microarrays
0
gravatar for arronar
2.6 years ago by
arronar240
Austria
arronar240 wrote:

Hello.

I'm about to process around 6 different microarray datasets and currently I'm working on the filtering step by using the nsFilter() function from geneFilter library.

After running the raw data through the RMA algorithm, I'm using the following code with a custom function in order to remove the filtered genes.

frma.filtered <- nsFilter(frma.data, require.entrez=FALSE, remove.dupEntrez=FALSE)
frma.data = remove.filtered.genes( filtered.data, frma.data )

remove.filtered.genes <- function( filtered.data , rma.data ){

  for ( i in 1:nrow(filtered.data) ){

    index = which(row.names(rma.data) == row.names(filtered.data)[i] )

    if( length(index) == 1 )
      rma.data = rma.data[-index, ]
    else
      print( paste("[Err] Problem with ", row.names(filtered.data)[i] , " it's exists more than one time in the original (non filtered) data", sep="") )
  }
  rma.data
}

The thing now is that I realized that for the 5 out of 6 datasets, the result of the frma.filtered$filter.log is the same.

$numLowVar
[1] 27307

$feature.exclude
[1] 62

Is this something expected that depends on microarray chip or is a fault/miss-usage of mine?

Thank you.

ADD COMMENTlink modified 2.6 years ago by Kevin Blighe63k • written 2.6 years ago by arronar240
0
gravatar for Kevin Blighe
2.6 years ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

The nsFilter() function will decide which transcripts to remove by looking at all samples together. So, you should just have a single list of genes that were removed for each failed QC flag.

I have not used nsFilter, ever, in the past and prefer to manage my own QC of microarrays (old fashioned); however, I'm not alone in my skepticism of this function: Question: Filtering array data with nsFilter

Kevin

ADD COMMENTlink written 2.6 years ago by Kevin Blighe63k

Thank you very much for your answer. Do you think that the following approach is better ?

library(genefilter)
f1 <- pOverA(0.25, log2(100))
f2 <- function(x) (IQR(x) > 0.5)
ff <- filterfun(f1, f2)
selected <- genefilter(eset, ff)
sum(selected)
esetSub <- eset[selected, ]
ADD REPLYlink written 2.6 years ago by arronar240

To answer that, I will ask you a question: why do you feel the need to apply these filters to your data? The main microarray data processing algorithm, i.e., RMA normalisation, is designed to deal with virtually all issues related to data distribution and background noise. Are you noticing further issue with your data post normalisation?

ADD REPLYlink written 2.6 years ago by Kevin Blighe63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1377 users visited in the last hour