I am trying to figure out whether there is any relationship between the expression time series curve of a gene and its function. For this purpose I plotted time series plots for over 5,000 yeast genes. Unfortunately, when using the regular commonly used kind of Pearson correlations the plots get normalized by default regardless whether the expression of a gene changes a lot or not at all. That is why I am now looking for a different kind of correlation function, which does not automatically normalize before calculating correlations / similarities, but - which instead - is based on the absolute values of each curve. That way I hope to get fewer highly correlated curves even when the genes don't seem to have anything to do with one another. I only want to curves to come out as highly correlated if not only their relative shape - but especially their absolute values are also very similar. What kind of comparing function should I use for this purpose? I'd prefer R since I generated the attached plots in R. My plots show the same as those from the publication, from which I took the microarray gene expression data. I was using the following study:
"Global control of cell-cycle transcription by coupled CDK and network oscillators" by
David A. Orlando, Charlenter link description herees Y. Lin, Allister Bernard, Jean Y. Wang, Joshua E. S. Socolar, Edwin S. Iversen, Alexander J. Hartemink & Steven B. Haase doi:10.1038/nature06955 (http://www.[enter link description here]nature.com/nature/journal/v453/n7197/edsumm/e080612-19.html)
I was able to replicate their results although I intentionally did not normalize because I feel normalizing is cheating and treating some genes unfairly.
My results look similar to theirs in figure 2 (see http://www.nature.com/nature/journal/v453/n7197/fig_tab/nature06955_F2.html#figure-title)
They identified 6 genes that oscillate particularly strongly with the cell cycle and can therefore be considered as cell cycle drivers. Those genes are CLN2, RNR1, SIC1, NIS1, CDC20 and ACE one (see text below figure 2). Also in my attached plots the same genes have the highest variance. But their are many genes with punitive and unknown functions, which seem to cycle with them and which could therefore be considered as regulated by the same mechanism. .
Now I am looking for a way to measure the similarities / correlations between these 6 cell cycle driver genes and the rest. That way I hope to define regulatory units. But I don't want normalization because it sets the fluctuation (i.e. difference between minimum and maximum for each time series curve, equal to each other even if they are not. I tried correlation based on normalization but then I could not find any GO-Term enrichment for the many seemingly correlated genes, whose trajectories were more than 0.85 correlated despite having completely unrelated functions. I want an absolute correlation, where only trajectories that would almost lie on top of each other based on their absolute values, but not their relative values, i.e. not based upon the overall shape of the curve after it has been normalized, get a high correlation score. I am not sure whether this approach well definitely work better than our many failed attempts to master controlling the aging process to the point were we can effectively and permanently reverse it so that old age and death could no longer threaten us anymore. What is an R function that can calculate me such an absolute instead of a normalized-based correlation?
My hypothesis is that when the yeast is still young the genes belonging to the same regulatory units are very well coordinated and work together. But as the yeast ages this synchronicity is gradually lost. This interferes with the proper functioning of each regulatory unit to the point where it causes aging and death if the genes of a regulatory unit don't work together at all anymore.
My hypothesis is that if we do the same experiment with old yeast cells much fewer genes would follow with a much smaller magnitude their cell cycle gene leaders. Here is the link to the microarray dataset, which I have analyzed: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE8799.
The red and blue lines in my attached time series plots are CDK mutants, which cannot properly perform the cell cycling anymore. But, for us the important part for determining which genes we'd consider as members of the same regulatory unit / group / GO-Term (Gene Onthology) are the 2 Wild Type replica, which are shown by a light green and light blue line in my attached time series plots for over 5,100 yeast genes. .
Since the magnitudes of the cycling can take up to more than 50% of the range it can become also much clearer now why we could not observe any clear linear trends when looking at the other datasets with measurement throughout the entire lifespan of the yeast because the gaps between them are exceeding the time for one cell cycle, i.e. 2-4 hours maximum. The time on the plots X axis s given in minutes.![enter image description here] Thomas
At the end of this text I have inserted the link to my time series plots and the Nature article plus supplements. Now I need help in figuring out how to make co-expression and regulatory networks from these time series plots. From visual inspection it seems to me that the time series plots of genes, which belong to the same GO-terms don't appear to be any more correlated to one another than they are to all the remaining genes. But as far as I understand this is the basis on which co-expression networks are build. Am I understanding things wrong here?
If you can help with any answers to these questions or explanations or materials I would be very thankful because I somehow need to solve these problems before I can get my degree but I am only very slow in googling things since I am legally blind. But when I know which text is important I can listen to it.
So please reply to me via email at
<censored> or via Skype to my Skype ID, which is
<censored>. Thank you so very much in advance.