Predictive profiles derived or diagnostic profiles derived from microarrays are indeed often used, and alas not very successful. That is for a large part a result of the nature of the arrays and to some extend also of the nature of the diseases.
Microarrays in general measure the expression of many many genes. So the chances of finding a false positive result for a single gene are relatively high. The chances of repeatedly finding the same gene regulated in multiple samples as a false positive are slim when calculated per gene, but still relevant when calculated per array. Such false positive genes will normally not survive the validation but that means you have to reject the profile, not just the gene. Performing rigid False discovery rate corrections would of course help, but it also lowers the chance of finding a meaningful profile in the first place.
Complex diseases like cancer do indeed often result from aberrant gene expression, but that can also be caused by copy number variations that include the involved genes. Such copy number variations yield a larger number of affected genes that will obscure your analysis. But this can be treated by studying the individual array results for the occurrence of such copy number variations (a number of strongly regulated in the same genomic location).
In general lower expressed genes that are measured around the detection limit of your array will yield highly dynamic results that often obscure the analysis. It helps to remove such genes. But unfortunately transcription factors and other regulatory genes mostly have low expressions and these are often important for disease regulation.
Some genes show high biological variations for instance the genes that are found in peripheral blood mononuclear cells (PBMCs), that are often used for biomarker development since they can be easily obtained in humans, are important in immune and inflammatory responses and thus show highly dynamic behavior for these genes. This variation may be more the result of a common cold than of a tumor. This can be improved by removing such highly dynamic gens from your analysis.
Cellular contaminants can also cause variation. The presence of low amount of reticulocytes in your sample can for instance result in large variations in hemoglobin expression.
Tumors (and to a lesser extend other diseased tissues) are highly variable in composition and micro environment like oxygenation. This will result in large differences in gene expression even between otherwise equal tumor cells. This is almost impossible to prevent and by itself probably disqualifies the approach in most cases.
One thing you can try to overcome these problems, apart from the cleaning procedures suggested, is to build the profiles not from individual genes but from affected pathways or functional gene classes.
Cancer is a complex disease, that depends on genetic and environmental factors. In terms of genetic background, some mutations will increase the risk of developing cancer and DNA microarrays can be used to assess association between mutations and cancer through genome wide association studies (GWAS). Searching 'GWAS cancer' should get you enough reading material.
Expression arrays have also been used quite a bit as prognostic or diagnostic assays. Essentially, expression signature are used to predict virulence of treatment effect or to classify cancers. Such studies always use a training set to define the signature (i.e the features of interest) and a test set to validate it. There is a wide range of publications out there: searching 'microarray cancer signature' for example should get you on the path.
My answer might be a bit vague, but there is too much out there and your question is very general.
To add to Laurent's answer, microarrays are also often used to predict the risk of metastasis in cancer patients. For example, the following paper describes a microarray-based method for predicting metastasis risk for breast cancer patients. The paper describes the clustering/statistical methods involved.