I am trying to use GSEA GUI from broad institute to do gene set analysis on RNA seq data. I have been reading many posts and researched GSEA website about the DEseq2->GSEA workflow and here is what I understood from it.

So if I used DEseq2 package to get a list of DE genes and if I would like to run preranked GSEA function,

  1. get a table with the list of genes on the row and log2FoldChange, p-value, and adj p-values on the column
  2. order the gene list by a metric -log10(p-value)*sign(logFC) and create rank file (.rnk) in R
  3. load this file to GSEA software and run GSEApreranked after choosing required and basic fields (making sure enrichment statistic is "classic")

Am I on the right track in understanding this workflow?

As you use R pipeline, I recommend to use R implementation of pre-ranked GSEA: either through fgsea package or clusterprofiler/DOSE interface. It's the same method but much faster.

Answering your question, I normally use stat column of DESeq2 results, but your metric should also work fine.

Looks good. I would always recommend to do in your own way and get feedback. The -log10(p-value)*sign of the fold change is been used in published papers,so there is no problem with it.

thanks for clarifying! Could you elaborate a little bit on taking p-value (unadjusted) vs. FDR-adjusted pvalue for metric calculation, and if I use metric =(sign of log2FC)*(-log10(pval)), what is the appropriate parameter for scoring scheme (classic vs weighted (p=1))? How should I decide to use classic or weighted scheme? Thanks for your input in advance!

are you doing this for the purpose of cross checking the significance of the set of genes selected by DESeq2?

