Question

How to rank a gene list using correlation coefficient and p-values for GSEA/clusterProfiler?

1

Entering edit mode

7 days ago

c-wes ▴ 10

I have two datasets: a bulk RNAseq dataset for different samples, and a viability dataset with scores giving the effect of a treatment on sample.

From this dataset, I calculated the correlation of the different gene expression profiles with the viability scores for each treatment. So for each treatment, I have a list of genes and their correlation with the treatment-viability, and a p-value.

I want to rank this list as input for GSEA (previously used gseKEGG in ClusterProfiler). What is the best way to rank this gene list?

Possibilities I could imagine:

correlation coefficient (from +1 to -1)
sign(correlation coefficient) * -log10(p-value)
p-value (from 0 to 1)

Using solely the correlation coefficient would be easier to interpret, as a high correlation would directly correspond to the effect of treatment on viability (gene correlated with sensitivity or resistance to treatment)

I would not really know how to interpret solely the p-value.

And using the combination formula with correlation coeff. and p-value would become more interpretable, but also a bit messy (gene significantly correlated with sensitivity or resistance to treatment). I drew on this possible formula, as I've seen it used for ranking DE (sign(FoldChange) * -log10(p) ), but I've also seen some critique of this as well.

Overall, I can't really find a good source for how to rank genes for GSEA outside of differential expression.

ClusterProfiler GSEA correlation rank genelist • 285 views

ADD COMMENT • link 7 days ago by c-wes ▴ 10