8.4 years ago by
Washington University, St Louis, USA
My recommendation based on experience with oligo arrays (e.g., Affymetrix expression arrays) is to not combine with a simple average/median/etc when you have more than one probe set supposedly querying the same gene. Take the example of ESR1 (Estrogen receptor), a very important gene in breast cancer. On the U133A array this is represented by 9 different probe sets, only one of which works as expected (see figure below). Averaging produces a terrible result. Even the cleverly re-defined custom probe sets from the Michigan group don't perform well in this case (although generally they work much better than Affy's standard probe set definitions).
What you should do does probably depend on your final goal. But, if your final goal involves identifying differentially expressed genes between different conditions or using expression values in a clustering or classifying exercise then I suggest:
- Choose the probe set/spot with the highest variance (across all samples in your study) for each gene. This is the kind of filtering you are likely to do anyways to reduce multiple-testing problem, is unbiased with respect to your comparison, and will avoid the issue of averaging out real signal with noise.
- An even safer option (in some ways) is to just leave all probe sets/spots in your analysis until the very final stage of biological interpretation. This way each probe set corresponding to a gene gets a chance. That can also be helpful if multiple probe sets map to the same gene locus but actually represent different transcripts.
Figure explanation: The figure shows a set of several hundred breast cancer samples which were expected to be predominantly ER-positive but with a few ER-negative samples mixed in. The last probe set at the bottom shows the expected pattern of nice strong expression for most samples with a small subset showing very weak expression. Other probe sets show only a weak distinction between ER+ and ER- samples or don't have any discernible expression at all.