Question: What Are The Most Reliable Normalization Methods For Microarrays?
3
9.2 years ago by
Jarretinha3.3k
São Paulo, Brazil
Jarretinha3.3k wrote:

Hi people,

I've just attended a seminar focused on microarray data, essentially given by experimentalists. It was somewhat shocking that they were unable to agree on what methods to use for data normalization (and why). So, you can imagine what happened in further steps . . .

Hence, I'm wondering about a list of the most reliable methods for data normalization. Not a plain list of methods/models. A list explaing why a given method/model is reliable (or why someone should use it).

Just to avoid some confusions, in this context reliability acquires its statistical meaning.

This question is relevant just because the most popular normalization procedures depends on statistical models to address probe-level, background-level, etc., variation/correlation. For example, RMA and fRMA uses a linear model.

So, given the number of microarray plataforms and designs, reliability is of utmost importance.

model data microarray • 6.0k views
modified 8.3 years ago by Michael Dondrup46k • written 9.2 years ago by Jarretinha3.3k

Can you give a short definition of what you mean by reliability in the statistical sense? I confess I had to look up the definition myself, but if it is as is wikipedia "In statistics, reliability is the consistency of a set of measurements or measuring instrument, often used to describe a test. Reliability is inversely related to random error." then the question makes no sense, because most or all normalization methods are deterministic.

It's true that many models are deterministic. But, the most used are model-based. Hence, stochastic/statistical in its nature (e. g. RMA). If one treats normalization as an experiment (which indeed it is), this question makes a lot of sense, though.

It is totally wrong that a method just because it involves 'a model' becomes non-deterministic! A linear model, given the same data, for example reproduces the same results, always. So, reliability is of "utmost importance", but it is solved.

RMA depends on a linear statistical model. You can check it on the paper if you want. I agree with you if you use a given linear model and OLS it will give the same results. Still, it is a statistical model. But, normalization is not so simple !!! People use a wealth of techniques. If it was just linear regression, this question would be trivial. But, as it depends on many instances on M-estimators, specific training sets, etc., I still think that its not solved. Otherwise, people wouldn't gather on a room for two days to discuss which one is the most reliable.

I think asking for the "best method" ends up being less productive than asking for opinions on the strengths and weaknesses of a few existing methods.

6
9.2 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

I recommend that you visit PubMed, enter "microarray (normalisation OR normalization) as a query, select some of the review articles and have a good read. Then, armed with appropriate keywords (RMA, GCRMA, MAS), head to Google and obtain some more opinions.

It is not so surprising that people cannot agree on methods. A normalisation method is just a statistical model that tries to explain what happens when probes meet gene chips. Different models have different assumptions. Some of these are: how to distinguish within-array effects from between-array effects? Are mismatch probes ever useful? (RMA says no, because MM probes often, in fact, match). How is "background" distributed across a chip?

Experiments also vary. Which of your experiments are comparable? Should you even be comparing, say, samples prepared last week and frozen to samples freshly-prepared today? If you do want to compare, you can only hope that each set will show some characteristic (batch effect) for which you can correct.

How do you conclude that a method is "right", or "better"? You might try to validate using another experimental method, such as real-time PCR. Or you might conduct "spike-in" experiments, where you know what the "true positives" should be, then see how well each method picks them out. That's the approach taken in this paper. Or, you might try several methods on your own favourite dataset. Of course, someone else will then try them on their favourite dataset - and reach totally opposite conclusions! Or you might just ask "what do most people do?"

Let me say that I do understand the tests. I'm one of the guys who develops statistical tests for such an end. My questions is about reliability, not power or accuracy. I found out that my notion of reliability (which is based on statistical mechanics ideas) is too much different from that of peple on the wet bench. IMHO microarrays are not suitable for differential gene expression & similar experiments (way too high type-II error rate). But, as experimentalists keep using it, reliability still is a important matter.

If you develop the tests, you should be telling us which are the most reliable :-)

I tend to agree with Michael's comment at the top. Normalization is not a measurement. If anything, the raw intensity is the measurement. But it is not a measurement in the same way that, say, putting a thermometer in water is a measurement. You might have a hypothesis about what observed intensities should be, but variation will ensure that this will never be consistent.

I also agree with you that many microarray experiments are poor measures of gene expression - but that's the way the science chose to go.

Is I said in other comments, one can treat the normalization procedure as an experiment over the raw intensities. There is no legal impediment to do that. The normalization procedure will be you thermometer. And, just to remember, RMA do have hypothesis about intensities.

3
9.2 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

The lack of reproducibility in microarray methods is well known problem. In my opinion the reasons for this go way beyond the choice of normalization and are primarily caused by biological and experimental variability. Some are convinced that one method must be substantially better than the other, but I suspect that is because that particular method worked well for them under some specific circumstances.

I read studies that demonstrated that the upper 50% percent (the strongest signals) were recovered identically across just about all methodologies, whereas the bottom half contained a different subset for each method. So maybe the best strategy is to be more strict with the results, beyond what the original estimate of the significance is - of course it could be that this approach removes the genes of interest.

That's the long answer. The short answer: the best normalization is the one you understand the best.

;-)

Absolutely agree; it's easy to get wrapped up in normalization choice and forget other factors.

Again, I'm asking about realiability. After normalization, everything is pretty straightforward. Those studies seems very interesting !!! Could you name them?

I think it is the normalization that is pretty straightforward, you pick a method and run it. The interpretation that comes after is a lot more difficult. Search for "reproducibility of microarray data" for many papers on this. I just quoted from memory not from a document.

Normalization is the main source of variability (or lack of it) in microarray data. As the intensity level is non-linear and most normalization procedures do use a linear model, the choice of the model do alter the final result.

You might have to look at the definitions of some of the terms you are using first and clean that up, you are using them wrongly. e.g. "intensity level is non-linear" has no meaning, "most normalization procedures do use a linear model" where did you get that information from? "Normalization is the main source of variability (or lack of it) in microarray data" how can you arrive at this judgement? IMHO this is totally mistaken.....

Clarifying the meaning: the intensity level scale is non-linear. Just to mention: RMA, GC-RMA, fRMA, quantile use a linear model. I'm sure I can find more examples. And a random reference about methods says: "Consistent with previous results we observed a large effect of the normalization method on the outcome of the expression analyses.". This observation is quite reasonable as the intensity scale (which defines the experiment) will be normalized.

3
9.2 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

Following this definition, all deterministic methods are 100% reliable, because they always reproduce the same result when repeated. Reliability is - of course - important for measurements, but data-transformations are not measurements. There are some statistics (not normalization methods I know of) for example those involving the EM-algorithms or k-means clustering.

So, my advise: check if the methods are deterministic, then they are reliable by definition. This question of reliability is for sure relevant for the measurement techniques such as microarrays, qPCR, RNA-seq, but it is totally solved for normalization (say: ALL methods are deterministic/reliable). If you are looking for a problem to solve in normalization this is definitely not the right place.

BTW.: one can easily assess the reliability. If you want to check RMA, loess-normalization, mean or quantile normalization, just run it on the same input data say 1000 times and look at the results. BTW2.: RMA because mentioned (robust multichip average) is not (only) normalization, it comprizes background subtraction, quantile normalization (a totally deterministic method), and intensity sumarization.

Edit: Just to restrict the above said again. There are some reliability issues with normalization. I just saw a message on bioconductor noticing differences in the analysis using GCRMA on windows/linux. As said, most normalization and summary methods are deterministic as long as data and methods stay the same. However, there can be variations on the probe level, even when using the same array design. The most common source of such events is that the array annotation and thereby the probe-level groups and their assignments to genes are changed.

This is sort of a "pseudo-(un)reliabilty" because if all parameters are the same, the results are the same. But the annotations are frequently changed and the annotation updates are mostly included automagically without the user noticing the difference. This is specifically true for the Affy platform.

I've checked. Most methods are statistical and rely on parameter estimation for normalization. Looking to 1000 samples is almost the as Monte Carlo estimation. Of course it will produce the same result, at least on average. It's not a complicated question. I asked for the realiability of the method !!! Even quantile normalization relies on parameter estimation. My question is totally unrelated to the precision of one's computer . . .

No, not on average, exactly. I recommend you really try this out before you claim something. But use one and the same dataset and run the same method, say RMA 1000 times, then publish the result here. Also, you are mistakenly interchanging parameter estimation for non-deterministic outcome. So please, try it out with one technique first.