I have been using the following tutorial by Stephen Turner and Will Bush to look at some RNA-seq data.
Looking into GAGE's documentation, it looks like they are using it in a somewhat non-standard way. Specifically, it looks like they are using it to conduct a GSEA-esque analysis, feeding it a vector of fold changes annotated by Entrez IDs and looking for enrichment within pathways contained in the
Were this a standard GSEA analysis, I would order transcripts by log2 fold change prior to analysis. In this use case of GAGE, should transcripts also be rank ordered prior to analysis? Running it both ways appears to make a large difference, at least in the case of my data.
Yes. It is my understanding now that this may be caused by geneIDs in a given ontology list mapping to multiple transcripts of the same gene in the data. I believe GAGE is looking for a one-to-one mapping between a measure of DE and a gene, not multiple measures of DE for, say, different isoforms of a gene mapping to common geneIDs in the ontology list.
In this sense, it does not appear to be the best approach for RNA-seq data and certainly doesn't take into account things like read length and expression biases. I have since started to work with ontology enrichment analysis tools such as GOseq specifically tailored to RNA-seq data. It is a shame, because Pathview appeared very nice for generating easily understood figures.