4.1 years ago by
You want GSEA (Gene set enrichment analysis), which is rank based rather than raw score based, and can be used to look at how highly enriched a sample or group is for a specific gene signature. You can either do straight GSEA (you can find the software on the Broad's website) or do single sample GSEA (ssGSEA), so you can see how samples compare against each other. In fact, I'm working on something at the moment that uses that to look at the level of contamination in samples of non-targeted tissues (and this seriously screws up expression analysis, particularly in RNAseq). If you need ssGSEA, just let me know and I can post some R code to do it to save you the trouble of finding it or figuring out how to do it yourself.
Edit: Because I was asked, the original GSEA publication looked mainly at how survival in different cancer types tends to involve recruitment of the same pathways, though you wouldn't find this by looking at individual genes. The general idea behind the method is to see if genes in a given predefined set are up/down-regulated in relationship to something of interest. It seems that this is mainly done in cancer, looking at survivability or cell-type of origin (if you have only a few cell-types of interest and do get a list of DE genes between them, then you would expect a cancer originating from cell-type A to show a more enriched signature for that than a cancer arising from cell-type B. My own use of this is focused more on looking at how a given sample might actually be a mixture of multiple sources (one can use a "signature" of one of the sources to gauge how heavily it's present) and how that fact, when not accounted for, can lead to incorrect DE results (and then how one might correct for this and screen for it ahead of time). At the end of the day, this is quite similar to cancer papers looking at tumour sample purity, like this one.
modified 4.1 years ago
4.1 years ago by
Devon Ryan ♦ 78k