Question

How To Determine If A Gene Has A Low Intensity/Expression In Microarrays ?

0

Entering edit mode

10.5 years ago

jerome.lane.34 ▴ 70

Hello,

I have read that low intensity/expression genes are prone to be false positive compared to other more expressed genes in microarrays.

I did not know what was the intensity threshold under which a gene is considered to be lowly expressed.

I found out that there is a way of classifying genes expression level in not expressed/low/medium/high-expressing genes using the signal/noise ratio (SNR) thresholds (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC436055/).

However, in this publication the threshold based on SNR seems to be chosen arbitrarily.

Do you know other approaches to classify genes according to their expression levels ?

gene classification microarray • 10k views

ADD COMMENT • link updated 9.3 years ago by Biostar 20 • written 10.5 years ago by jerome.lane.34 ▴ 70

0

Entering edit mode

I'd suggest that many cut-offs, thresholds etc. are chosen arbitrarily. There often isn't a "correct" value; it's a matter of how many true/false positives/negatives you are prepared to tolerate.

ADD REPLY • link 10.5 years ago by Neilfws 49k

score 0 · Answer 1 · 2013-10-23

0

Entering edit mode

10.5 years ago

Charles Warden 8.2k

If you create an MA plot, you can see how absolute intensity values correlate with variation:

http://en.wikipedia.org/wiki/MA_plot

When you see the bullet spread out more, you data is getting more noisy. You can use this chart to pick a threshold to ignore low-expression genes (e.g. when the variation in M starts to increase).

Different platforms and different normalization methods yield different signal distributions, so it isn't possible to give one universal cutoff for genes with low expression values.

ADD COMMENT • link 10.5 years ago by Charles Warden 8.2k

0

Entering edit mode

As Neilfws suggests, and as is neatly supported by the figure at the bottom of the page you link to, there is no correct threshold. On the figure, there is no logical place to choose as a cutoff, only a gradual increase in the 'noisiness'. Speaking of that figure, it is a strange choice to present as typical data since there is at least 2 forms of artefacts in that figure (spurr on upper left pointing to the up and right and the one in the center pointing down).

ADD REPLY • link 10.5 years ago by Eric Normandeau 11k

score 0 · Answer 2 · 2013-10-23

We usually go for the following procedure to choose genes that are expressed highly enough to analyse. Although it is arbitrary, in the sense that WE choose the method and threshold, not that we choose some random method, it is easy to explain and logical so it never gave us problems during reviews. It works if you have blank spots or negative controls on your array.

We calculate the average and standard deviation of the expression value of all the blanks and negative controls on one array (we do it separately for both colours in 2-colour arrays).
We define a threshold for each array (or colour) that is equal to the mean expression of the blanks plus two times the standard deviation (threshold = mean + 2*stdev) (assuming these do not deviate too much from a normal deviation, it means that the threshold will keep spots that are outside ~95% of the distribution of the blanks)
We keep a gene if it is above the threshold on more than 80% of the samples of at least one of the biological groups we are testing.

So, let's say you have 4 groups, if at least 1 group has a spot above the mean + 2 time the stdev for 80% or more of the samples, we retain it in the analysis.

Well, maybe not THAT simple to explain, but easy to understand and kind of logical. Plus, it retains spots that may be not expressed in one group but that are expressed in another one. These genes are especially interesting since they could very well be differentially expressed. Other methods may miss these genes.

score 0 · Answer 3 · 2013-10-24

When analyzing Affymetrix human exon arrays, I've used the DABG (detected above background) function from the Affymetrix power tools software. This provides a p-value for each probe after testing whether or not a probeset is detected above background. When using gene-level estimates, a threshold can be set such as requiring the DABG P < 0.05 in ~50% of the samples of at least one group (as suggested here). This method requires you to have the raw CEL files, and I'm not sure if this is appropriate for other Affy arrays beyond the human exon array.