Hi, I am new at biostatistics and I am interested to understand some stuff on the quantile normalization technique.
Let's assume that I have a microarray dataset that contains a control set (x4 repeats) , a treated condition (x4 repeats) and i want to look for diff expressed genes.
So the most common first step i saw on the net is to normalize data with quantile normalization technique. Before normalization data density functions looks like this :
By assuming now that there are no differentially expressed genes, we can apply the quantile normalization method to all of our data and make them to have the same distribution.
So here is where I'am getting a little bit confused. Till now i knew that if I'am going to run a t-test for a specific gene between these two conditions (control VS treated) I'm going to get a t-value on the null hypothesis distribution and then calculate the p-value. If that p-value is enough smaller from the a=5% threshold, then i could say that there is such a probability to get a difference like this or more extreme between these two conditions if my null hypothesis is true. In many cases in such situation we say that we can reject the null hypothesis and to support that this difference is due to the treatment that actually has its own (different) distribution compared to the control's.
So how is this going to work if we turn the two distributions into one with quantile normalization ?
Is right the way I'm thinking of it or am I missing something ?
If I am missing something then another way to represent variability except the density function of the conditions will probably prove that variability still exists between those two conditions. So, is there any other way to represent such information (variability) ?