7.5 years ago by
Now, despite my a little pessimistic comment, I had some ideas what could be helpful for the analysis.
First of all, your findings are not totally surprising to me. Assume that in any such experiment a large number of genes are not affected by the treatment or experimental conditions in questions. Therefore, any correlation present in a single condition 1 might be masked by random fluctuation in conditions 2-5. On the other hand, given the noise level of MA experiments, correlation could appear randomly, so you need a stringent cut-off. It is already interesting that you found some correlation structure, and worth looking at these cliques, even though isolated nodes remain. So, here are my 50ct, they are just ideas.
- Use update or optimized layout annotation files as in ffcccc's answer. I like the idea, but that might only work for Affy chips and similar where you have the definition files.
- The previous point implies to re-run the analysis on the raw-data including normalization and summarization.
- Try a more robust correlation coefficient, e.g. kendall or spearman rank correlation (though this might as well yield even worse scores)
- Play with the correlation cutoff. How far do you have to lower it to reduce the number of isolated graphs, maybe it's just a little bit below 0.8.
- Do you see highly connected cliques?
- For isolated graphs, you can compute centroid/medoid vectors, and compare them to other centroid vectors, see what is typical for their correlation pattern. Try to establish a link to other sub-graphs by means of the centroid/medoid vectors rather than the individual expression pattern.
- Work only with those genes that are called significant in each study or which are significant after e.g. applying limma to all 5 data-sets combined.
- Apply other methods, e.g. GSEA, GO-Analysis, cluster-analysis, etc.
Hope this helps, someone might come up with better ideas in the light of the biological question you are addressing.