6.4 years ago by
The rationale is that you need to be able to apply the classifier to a single-sample. This typically requires working with an intensity value (most likely from the cancer sample, in the example that you provided).
If the data type warrants an analysis of a log2ratio (such as with aCGH data for copy number calls), then that is OK. However, you have a couple additional considerations to worry about with gene expression data:
1) Does the normal sample provide useful additional information beyond what you can predict with the tumor sample? In practice, you don't want to run measurements twice if it can be avoided.
2) What is the biological significance of the normal sample? Can you truly call it an example of an unaffected tissue that is equivalent to the tumor tissue? This is not such a big deal with DNA analysis, but it is important for RNA analysis. For example, I've been knocked for assuming that adjacent tumor is equivalent to unaffected normal tissue from another patient (which *shouldn't* be used in your classifier. For example, differences could be due to the proportion of epithelial cells rather than because of a pathogenic aberration: http://breast-cancer-research.com/content/12/5/R87