Let's say I had two separate gene expression microarray experiments of normal vs cancer cell for the same cell type on the same platform like the Affymetrix Human Genome U133A Array. Would there be any pitfalls of aggregating the data by taking the CEL files from both and RMA normalizing it then comparing the aggregate control to the aggregate cancer? If so would it be best to start from the CEL files or could the aggregation work on even on more processed downstream data like the expression values from the soft files in the GEO database?
As Neilfws points out, it can be done, but that doesn't mean it should be done. We have an application designed specifically for this called iPathwayGuide www.iPathwayGuide.com). This is a web-based application that doesn't require any coding experience. Simply upload your CEL files in the groups your wish to analyze. iPathwayGuide will QC check and normalize (GCRMA) the CEL files automatically. Information about DEGs, Predicted miRNAs, GO terms, Pathways and diseases is provided in minutes. If you have multiple analyses, you can generate a meta report that will give you information about the overlap between the datasets. Here's are a brief video.
I'd consider looking at the results of the differentially expressed genes and compare those between the datasets, rather than trying to aggregate the raw (or normalized data) all at once. We do something similar to public datasets in our ImmunoLand product (http://www.omicsoft.com/immunoland).