Hello.
I'm about to process around 6 different microarray datasets and currently I'm working on the filtering step by using the nsFilter()
function from geneFilter
library.
After running the raw data through the RMA algorithm, I'm using the following code with a custom function in order to remove the filtered genes.
frma.filtered <- nsFilter(frma.data, require.entrez=FALSE, remove.dupEntrez=FALSE)
frma.data = remove.filtered.genes( filtered.data, frma.data )
remove.filtered.genes <- function( filtered.data , rma.data ){
for ( i in 1:nrow(filtered.data) ){
index = which(row.names(rma.data) == row.names(filtered.data)[i] )
if( length(index) == 1 )
rma.data = rma.data[-index, ]
else
print( paste("[Err] Problem with ", row.names(filtered.data)[i] , " it's exists more than one time in the original (non filtered) data", sep="") )
}
rma.data
}
The thing now is that I realized that for the 5 out of 6 datasets, the result of the frma.filtered$filter.log
is the same.
$numLowVar
[1] 27307
$feature.exclude
[1] 62
Is this something expected that depends on microarray chip or is a fault/miss-usage of mine?
Thank you.
Thank you very much for your answer. Do you think that the following approach is better ?
To answer that, I will ask you a question: why do you feel the need to apply these filters to your data? The main microarray data processing algorithm, i.e., RMA normalisation, is designed to deal with virtually all issues related to data distribution and background noise. Are you noticing further issue with your data post normalisation?