I am doing pathway and gene ontology analysis using Gene Set Enrichment Analysis(GSEA). For the tools, you need to provide a ranked gene list, however, various papers have provide different recommendations on how to do this.
Is there a current consensus on what is the ideal way to do this? I've been using Log2 Fold change, and I am unsure weather to use Fold Change, p-values instead. Or an other method?
One post: Problem with creating GSEA rank file recommended signed p-values, but I haven't found any literature reviews or clarification on the issue. clusterProfiler mentions fold change for ranked gene lists, so I am unsure if I would be getting "bad results" by using p-value sorting. And if the different packages are optimized for one or the other sorting.
According to Yu, author of cluster profiler:
geneList contains three features: numeric vector: fold change or other type of numerical variable named vector: every number has a name, the corresponding gene ID sorted vector: number should be sorted in decreasing order https://github.com/GuangchuangYu/DOSE/wiki/how-to-prepare-your-own-geneList
"other type of numerical variable" is unclear. Perhaps there are other, similar methods to GSEA who have a more concrete way of doing things?
EDIT: for clusterProfiler's function gseGO() I get different result when using Log2FoldChange versus FoldChange for ranking
Just because you see something in published papers doesn't mean it's good or recommended, a lot of authors miss things or do not have a deep knowledge of what they are doing, and such technical details are often not reviewed by peer reviewers, even in high impact papers.
Using only logFC or only p-value based ranking metrics (which includes the above approach since only using the logFC to get the direction) each have their downsides - genes ranked by logFC are biased by the bigger variance in genes with low counts and genes ranked by p-value are biased by genes with higher abundance and longer transcripts. See https://support.bioconductor.org/p/85681/