Question: Quantile normalization and variability questions
0
gravatar for arronar
2.6 years ago by
arronar200
Austria
arronar200 wrote:

Hi, I am new at biostatistics and I am interested to understand some stuff on the quantile normalization technique.

Let's assume that I have a microarray dataset that contains a control set (x4 repeats) , a treated condition (x4 repeats) and i want to look for diff expressed genes.

So the most common first step i saw on the net is to normalize data with quantile normalization technique. Before normalization data density functions looks like this :

enter image description here

By assuming now that there are no differentially expressed genes, we can apply the quantile normalization method to all of our data and make them to have the same distribution.

enter image description here

So here is where I'am getting a little bit confused. Till now i knew that if I'am going to run a t-test for a specific gene between these two conditions (control VS treated) I'm going to get a t-value on the null hypothesis distribution and then calculate the p-value. If that p-value is enough smaller from the a=5% threshold, then i could say that there is such a probability to get a difference like this or more extreme between these two conditions if my null hypothesis is true. In many cases in such situation we say that we can reject the null hypothesis and to support that this difference is due to the treatment that actually has its own (different) distribution compared to the control's.

So how is this going to work if we turn the two distributions into one with quantile normalization ?

Is right the way I'm thinking of it or am I missing something ?

If I am missing something then another way to represent variability except the density function of the conditions will probably prove that variability still exists between those two conditions. So, is there any other way to represent such information (variability) ?

Thank you.

ADD COMMENTlink modified 2.6 years ago by Fabio Marroni2.3k • written 2.6 years ago by arronar200
2

This paper will really help your understanding of when to use quantile normalisation: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0679-0

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by James Ashmore2.7k

Thanks. I'm gonna read it.

ADD REPLYlink written 2.6 years ago by arronar200

I am not sure, but I think that you apply the quantile normalization to each distribution, and the you get in output as many distribution as you had in the input. See e.g. https://en.wikipedia.org/wiki/Quantile_normalization

ADD REPLYlink written 2.6 years ago by Fabio Marroni2.3k

You mean that you apply the quantile normalization on control samples and treatment samples separately ?

ADD REPLYlink written 2.6 years ago by arronar200

I think this is incorrect. Quantile normalisation creates a reference distribution from all your samples which you use to normalise each of them separately. This is appropriate when there is small variability across groups. When there are large differences in the distributions between your groups (i.e. samples from different tissues), then quantile normalisation can mask real differences. Instead you can use smooth quantile normalisation.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by James Ashmore2.7k
0
gravatar for Fabio Marroni
2.6 years ago by
Fabio Marroni2.3k
Italy
Fabio Marroni2.3k wrote:

No, you apply the normalization to each sample you have. If you read the link I sent you, you will understand, there is an example (Sorry, this was intended to be a comment) .

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Fabio Marroni2.3k

yes but in wikipedia's example does not explain what each column stands for. As i said i have 4 technical replicates for control and 4 for the treated. In total lets say that i have 8 columns. Should i normalize them together or separately (controls on their own and treatment on their own)?

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by arronar200

You should normalize them together. However, I read the comments by James Ashmore, and I would suggest to take them into consideration.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Fabio Marroni2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1905 users visited in the last hour