Question

Microarray Detection Threshold

0

Entering edit mode

13.1 years ago

ff.cc.cc ★ 1.3k

Typical microarray datasets consist of log2-tranformed values of intensity (and sometimes their respective p-values). data ranges are often between 3 (very low expression) and 12 (very high expression).

But, do you have a rule of thumb to define a reasonable threshold to say that a probe has been detected for sure ? (e.g. if the probe has log-intensity >7 in at least n samples we can suppose the transcript is truly expressed)

Could such a threshold be tissue-dependent ?

Thanks

microarray • 4.2k views

ADD COMMENT • link updated 13.1 years ago by earonesty ▴ 250 • written 13.1 years ago by ff.cc.cc ★ 1.3k

score 4 · Answer 1 · 2012-06-07

Each probe has its own "detection level". IE: some probes are very sensitive and pick up noise more, others don't. That's why you cannot compare intensities between probes, you can only compare same-probe to same-probe in another array- differential expression. So a good detection algorithm doesn't use a single "threshold".

Although you need to determine if a probeset is significantly above its own background/noise level, in practice algorithms use various tricks to estimate regional (think quadrant) noise on the array, as well as mismatch probe values. They keyword you're looking for is "detection call".

R package that does this for Affy arrays: http://rss.acs.unt.edu/Rdoc/library/affy/html/mas5calls.html

Article comparing various detection call algorithms, and explaining it better than I do: http://bib.oxfordjournals.org/content/11/2/244.full

I doubt this threshold would be tissue dependent, but bear in mind there will always be batch effects when dealing with arrays (or chip effects with illumina), and make sure the processing of samples is not sequential (first doing controls then doing experiment, etc). Very often with arrays the effect you see is because of sample handling issues. Like leaving the experimental group in a plate just a little longer than the controls. Randomization really helps this. Processing samples in a random order will distribute this variability evenly - making it much easier to deal with.