I have two datasets: a bulk RNAseq dataset for different samples, and a viability dataset with scores giving the effect of a treatment on sample.
From this dataset, I calculated the correlation of the different gene expression profiles with the viability scores for each treatment. So for each treatment, I have a list of genes and their correlation with the treatment-viability, and a p-value.
I want to rank this list as input for GSEA (previously used gseKEGG in ClusterProfiler). What is the best way to rank this gene list?
Possibilities I could imagine:
- correlation coefficient (from +1 to -1)
- sign(correlation coefficient) * -log10(p-value)
- p-value (from 0 to 1)
Using solely the correlation coefficient would be easier to interpret, as a high correlation would directly correspond to the effect of treatment on viability (gene correlated with sensitivity or resistance to treatment)
I would not really know how to interpret solely the p-value.
And using the combination formula with correlation coeff. and p-value would become more interpretable, but also a bit messy (gene significantly correlated with sensitivity or resistance to treatment). I drew on this possible formula, as I've seen it used for ranking DE (sign(FoldChange) * -log10(p) ), but I've also seen some critique of this as well.
Overall, I can't really find a good source for how to rank genes for GSEA outside of differential expression.