Question: Ask for suggestions of filtering probes in affymetrix data
0
gravatar for liux.bio
5 days ago by
liux.bio340
China
liux.bio340 wrote:

Hi, biostars.

I am analyzing an affymetrix microarray dataset. After searching, I found following blogs about filtering probes:

Following the blogs, I determined the filtering standard

  • For each sample, compute 95th percentile of the negative control genes (normgene -> intro)
  • For a gene in a sample, if the gene's value is greater than the 95th percentile of the negative control genes in this sampe, we assume this gene is detected
  • if a gene is detected in more than half of all the samples, select this gene, otherwise, filter it.

After filtering, I obtain 6543 probes (the number of all probes is 53617). I want to know if the filtering standard is reasonable. Please give me some suggestions.

Here is my code:

library(oligo)
# load data 
celFiles <- list.celfiles("./rawData/GSE83452/celFiles/", full.names = TRUE)
rawData <- read.celfiles(celFiles)
# normalize 
processedData <- rma(rawData)

# filter genes with low expression
librarypd.hugene.2.0.st)
con <- dbpd.hugene.2.0.st)
# negative-control probes (normgene -> intro)
NCprobes <- dbGetQuery(con,  "select meta_fsetid from core_mps inner join featureSet on 
 core_mps.fsetid=featureSet.fsetid where featureSet.type='10';")
#  the 95th percentile of the negative control genes 
NCquantileValue <- apply(exprs(processedData)[as.character(NCprobes[,1]),], 2, quantile, 
 probs=0.95)

# function to filter genes
filterGeneDetected <- function(sampleValues, quantileValues, detectedRatio) {
    # the length of sampleValues and that of quantileValues should be the same
    # In a sample, if a gene's is up quantileValue, we say this gene is detected
    # If a gene is detected in more than detectedRatio * allSampels, the gene is selected
    samplesLength <- length(sampleValues)
    detectedNum <- 0
    for (index in c(1:samplesLength)) {
       if (sampleValues[index] > quantileValues[index]) {
           detectedNum =  detectedNum + 1
        }  
        }
    ifSelected <- FALSE
    if (detectedNum/samplesLength >= detectedRatio) {
        ifSelected = TRUE
     }
     return (ifSelected)
   } 

detectedProbes <- apply(exprs(processedData), 1, 
                    filterGeneDetected, 
                    quantileValues = NCquantileValue,
                    detectedRatio = 0.5)
processedDataFilter <- processedData[detectedProbes, ]

Many thanks for your kindness.

affymetrix microarray • 56 views
ADD COMMENTlink written 5 days ago by liux.bio340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 910 users visited in the last hour