I would like to cluster/make PCA among microarray samples accross two different platforms.I am afraid that clustering on the common genes between the platforms would be influenced more by the platform (different probes measuring different sequences of the transcripts and on different scale) then the treatment effect. As there is generally better consistency of the upregulated processes (enriched GO terms, pathways) I would like to cluster based on GO terms.
Suppose cells treated with compound A, B, C or D (each done in several replicates). Compare them to untreated control and that yields lists of differentially regulated genes. Determine GO terms (say for upregulated genes) GO.A, GO.B, GO.C and GO.D. This would be measured on platform 1. Then I would have cells treated with compound E, compared them to untreated control etc. to get GO.E. This experiment would be on platform 2. I would like to know, how similar is the effect of treatment E to A, B, C and D.
One solution that comes to my mind is first find common GO terms that are present on both platforms. Then compute GO.A, GO.B, GO.C, GO.D and GO.E. The GO terms not significantly changed (upregulated) would get p value 1. So I would have p values for all of the common GO terms. Then I would do for example PCA on the p values (I think they should be scaled first) and see the distance among the samples.
Does this make sense? Is there a better way?
Any suggestions appreciated!