8.0 years ago by
To resolve the unanswered state for this question. In agreement with most of the comments already given, the answer is, that at least in this case it doesn't make much sense to use PCA for gene ranking. This is because you have a Case vs. Control setting, which means you have a "2-dimensional" problem, the applications of PCA described in the paper are directed towards time series or other higher dimensional measurements.
Therefore you will get max. 2 principal components, and if you wanted to remove one, for eg. dimension reduction or noise reduction, you have one left. That is not good for doing a statistical test where you wish to compare two conditions.
Of course, one could rank the genes by their factor loadings (projection of the data on the first principle axis), but that doesn't seem to have any advantage in a case-control setting. A statistical test has the advantage of providing estimate of significance (aka. p-values), and allows to estimate power, etc. A PCA is a totally different technique, and doesn't provide these estimates. Unless you can better define the use-case and answer the question why a non-standard method should be applied I would stick with an established method.
You didn't tell if you have replication, but I guess so; therefore if you wanted to use PCA you need to decide at which point in your analysis you wish to summarize the replicates. At that point however, you are going to loose information about within group variance. In a statistical test, for example ANOVA, within group variance would be needed and compared to between group variance. Therefore, it is important to keep within group variance until the statistical test.