Question

GSEA: how to rank

0

Entering edit mode

5.9 years ago

jin.k.koo • 0

Hi. I am a beginner reading the PNAS paper, "Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles" by Subramaniana et al. Here, it says ranking genes based on the expression difference, but I found in the Fig.1A that there are multiple attributes for each gene (i.e. columns) for each class. If there are multiple fields for each gene, how can we order genes? Related with this, I am wondering what the columns are in Fig. 1A. Any help will be appreciated. Thank you!

gene enrichment • 2.0k views

ADD COMMENT • link updated 5.9 years ago by russhh 5.7k • written 5.9 years ago by jin.k.koo • 0

score 2 · Answer 1 · 2018-05-23

2

Entering edit mode

5.9 years ago

russhh 5.7k

The heatmap in figure 1A gives the estimated expression level for each feature in each sample.

When we run differential expression analysis we are estimating the difference in expression for a given gene between one set of samples and another set of samples (in the simplest case). So although there are multiple measurements for a given gene (one per sample) this is reduced to a single statistic (one per comparison) for that gene.

That statistic is computed for every gene, and the collection of those values is used in GSEA. I tend to use signed-ranks of the differential-expression-p-values, one per gene, as the input to GSEA.

ADD COMMENT • link 5.9 years ago by russhh 5.7k

0

Entering edit mode

Thank you very much for your answer! Can you tell me a few other options for "a single statistic", or any website that I can refer to them? I assume that the statistic should change after permuting the phenotype labels.

ADD REPLY • link 5.9 years ago by jin.k.koo • 0

0

Entering edit mode

Not quite sure what you mean. Differential expression analysis will give you a p-value with-respect-to the null hypothesis of no-difference between the two groups (for each gene). The methods for computing differential expression (from microarray data or RNA-Seq data) that are used in the literature, like limma, edgeR etc, are related to linear or generalised-linear models. If you've used t-tests and analysis-of-variance you could have a look at the user guides for those tools (limma / edgeR / DESeq2, that is).

If you've never conducted a t-test between two groups, learn to do that first.

ADD REPLY • link 5.9 years ago by russhh 5.7k

0

Entering edit mode

Sorry for the confusion. I was asking if there is an alternative to the differential-expression-p-values that can represent multiple measurements for a gene.

ADD REPLY • link 5.9 years ago by jin.k.koo • 0

0

Entering edit mode

If you have several values for each gene in your dataset and want to do a kind of holistic GSEA over those values I'm not entirely sure how you'd approach it. It certainly isn't something I've ever done.

An example where I'd have several values for each gene is when I have multiple contrasts (eg, treatment1 vs control and treatment2 vs control) in the same experiment. But in that setting I typically do fGSEA separately for each individual contrast (and reduce the gsea p-value-threshold); admittedly my approach is a bit flawed because if the different contrasts use the same controls then there's an implicit dependence between fGSEA for one contrast and another (if a geneset if artifically low in the controls, that geneset will be significant in fGSEA for both the contrasts) so I can see a value in having a multivariate GSEA. I wouldn't know how to implement it (or interpret it) though.

ADD REPLY • link 5.9 years ago by russhh 5.7k