Hello,
I am trying to do a microarray analysis on a time-series (5 time points) gene expression (GE) data of 3000 genes. I want to calculate all Vs all (3000 Vs 3000 genes') Pearson correlation coefficient (PCC) values. I searched a lot in literature and found that studies use values that are 'Log (base 2) of the ratio of the median of test spot's intensity to median of control's intensity'. The intensity values are already normalised (background normalised). Whereas, some studies subtract the control GE intensity value of a gene from the test value.
Following are the values of two genes (for 5 time points), which I used for calculating PCC. The problem is that when I plot a distribution of the Correlation Coefficient values (3K * 3K = 900,000 values), majority (~50%) of values fall between 0.6 to 1.00. This means most of the genes have a very high correlation among each other. But this is unexpected, because usually a very small fraction of the total genes show such a high correlation with each other. Therefore, I think that the kind of normalisation used to derive the following ratios might not be useful if the normalised values are to be used to calculate Correlation coefficient. :
Gene A: 0.0715 -0.1203 0.0039 0.7151 1.202
Gene B: -0.288 -0.0900 0.2310 0.3510 0.415
Kindly guide me on which of the above two kinds of values are appropriate for calculating pearson correlation coefficients of all Vs all genes.