Filtering Gene Expression Data For Preprocessing (Pm Only)
1
0
Entering edit mode
11.6 years ago
AngryBird ▴ 30

I am trying to be sure about the working of background adjust using MAS5 in R with relation to filtering. I have 9 samples all based on MoGene-1_0-st-v1.r3. This chipset only has PM intensities.

Here is what I actually want to achieve,

  1. Determine the highest brightness value that is considered to be equivalent to no expression, using PM intensity values
  2. Remove genes under this threshold from the data if none of the samples have any values over this threshold.

My questions are:

  1. Is this included in background adjust (RMA function in R)?
  2. Is this a sensible way of filtering data, are there any better alternative?

I am aware of the function:

nsFilter(exprSet_rma,require.entrez=TRUE,remove.dupEntrez=FALSE, var.filter = TRUE, var.cutof=LOG2_EXPRESSION_MEASURE_CUT_OFF_BY_QUANTILE)$eset;

but I am unsure what LOG2EXPRESSIONMEASURECUTOFFBYQUANTILE stands for and how to use it to do what I actually intend to do.]

Is there a proble that gives this intensity threshold information and how can I access it?

filtering r • 3.5k views
ADD COMMENT
1
Entering edit mode
11.6 years ago
VS ▴ 730

If you read the help for nsFilter in R (just type ??nsFilter in R console), you will find a full explanation. I will try to explain a bit here -- about "LOG2EXPRESSIONMEASURECUTOFFBYQUANTILE" , RMA reports the expression measure as log2 values and the default method used by nsFilter function to decide the threshold is the interquartile range. Below an excerpt from the help page --

The default var.funcis IQR, which we here define as rowQ(eset, ceiling(0.75 * ncol(eset))) - rowQ(eset, floor(0.25 * ncol(eset))); this choice is motivated by the observation that unexpressed genes are detected most reliably through low variability of their features across samples. Additionally, IQR is robust to outliers (see note below). The default var.cutoff is 0.5 and is motivated by a rule of thumb that in many tissues only 40% of genes are expressed. Please adapt this value to your data and question.

Background adjustment is meant to process raw intensity values for each probe, so as to remove the noise component and retain only true signal. It has nothing to do with filtering -- that is discarding probesets based on an expression threshold, I say this because it seems at some level you are confusing filtering with background adjustment. To understand what RMA and MAS5 background adjustments algorithms are you can read the affy vignette

So, to implement your own function for deciding 'no expression' , you can define your own var.func under nsFilter which you have already identified.

ADD COMMENT
0
Entering edit mode

Thanks for answering my question. After some reading I realized how wrong I was. Although I would like to ask you this to implement your own function for deciding 'no expression', how is the threshold usually decided, if not a rule, general guidelines might help as well. Thanks again

ADD REPLY
0
Entering edit mode

For general guidelines, you can read the help page of genefilter and related publications. As already given above, in expression arrays it is taken as rule of thumb that about half of them will not be expressed. But you should plot several diagnostic plots of your data to arrive at a decision.

ADD REPLY

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6