This seems to be a simple question.
You have a list of genes from DEG analysis, with p-values, FDRs, & logFCs, etc. Previously, what I do for GSEA analysis is to filter in genes with FDR < 0.25 or 0.05, rank them by logFC (in other words, pre-rank the genes by logFC), and then do GSEA. Now I am wondering if this is a good way:
- There might be too many genes (typically ~50%). Assuming usually
there are 4~5 pathways involved and each pathway has about 500 genes, then the top 2,000 genes might be enough to be included for GSEA
- Not sure if logFC is the best way to rank genes. Maybe
use -log(PValue) as the magnitude of the rank score and the sign of logFC as the sign of the sore? i.e., use sign(logFC) * (-log(PValue)) as the rank score?
Googled briefly but didn't find a convention.