How To Remove Noise In Microarray Data?
2
0
Entering edit mode
9.8 years ago
gundalav ▴ 380

What's the current standard practice in removing noise (in terms of probe values)? These noises will typically affect the interpretation of fold change.

We could just pick all the values above certain threshold, what would be the best way to determine this threshold?

Any R package for doing that?

microarray r • 4.1k views
4
Entering edit mode
9.8 years ago
David ▴ 740

Noise is handled at the normalisation step for microarrays. A robust normalisation followed by a strict QC will eliminate faulty chips and improve your signal/noise ratio. There has been quite some research on how to remove noise in microarrays. The noise can originate from different sources broadly organised in biological and technical types [ref 1-4]

Usually a filter based on corrected p-value is the most secure approach. A FC threshold is never a good idea to cut down your list. A mRNA concentration change is almost never proportional to the protein concentration change. This imply that significance in FC is gene specific.

If you want to filter based on a sound statistical approach that used the notion of FC have a look at the Rank Product package in R. Rank Product will look at the rank of the fold change of the genes and extract a FDR value based on permutation test. The paper is easy to read [ref 4]. I like to use RankProd when possible. It does not work if you have more than 2 conditions to compare but has a special mod that account for batch effect.

References:
1. Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S, eds. 2005.
Bioinformatics and Computational Biology Solutions Using R and Bioconductor (Statistics for Biology and Health). 1st ed. Springer.

1. Kauffmann A, Huber W. 2010.
Microarray data quality control improves the detection of differentially expressed genes. Genomics.

2. Shi L, Reid L, Jones W, Shippy R, Warrington J, Baker S, Collins P, de Longueville F, Kawasaki E, Lee K, et al. 2006.
The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24: 1151–1161.

3. Witten DM, Tibshirani R. 2007.
A comparison of fold-change and the t-statistic for microarray data analysis.

4. Breitling R, Armengaud P, Amtmann A, Herzyk P. 2004.
Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 573: 83–92.

2
Entering edit mode
9.8 years ago
polarise ▴ 380

I only have experience with Affymetrix exon arrays. It seems, for expression, Affymetrix are the most popular. There are two main ways you could go about it.

1. Detection Above BackGround (DABG): Computes p-values relative to a group of probes with similar GC-content to the probe (probe set) in question.
2. MAS5 Detect: Just what it says - it gives present/absent calls on whether a probe (probe set) is expressed.

Both of these are available using Affymetrix Power Tools: http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx?hightlight=true&rootCategoryId=34002#1_2

There is a good paper on doing microarray analysis: http://bib.oxfordjournals.org/content/12/6/634.short

Your question mentions probes. Affymetrix deal in probe sets (sets of probes). However, if you want probe data then you'll have to extract probe intensities (with or without additional processing). Your best bet is to use Affymetrix Power Tools (also discussed in the Lockstone paper above). In this case you'll use the apt-cel-extract command.

PK