I've recently performed a GSEA analysis (using fGSEA in R) on an RNA-seq data set, consisting of 4 pairwise comparisons: 1.) drug 1 vs. untreated, 2.) drug 2 vs. untreated, and 3.) drug 1 + 2 vs. untreated).
Using edgeR, I found that there were ~2000 differentially expressed genes (DEGs) for drug 1 vs. untreated, ~7000 DEGs for drug 2 vs. untreated, and >9000 genes for drug 1+2 vs. untreated. After ranking the full list of genes in each of the 3 signatures (by the following formula: -log10(adjusted p-value)/sign(logFC)), I performed GSEA using fGSEA (v. 1.10.1) in R, against the Human GO and Reactome databases (~3700 pathways). After applying a Benjamini-Hochberg p-value adjustment, I obtained the following number of ‘significant’ gene sets/pathways (adj. p < 0.05): 1.) 70 for drug 1 vs. no tx, 2.) 0 for drug 2 vs. no tx and 3.) 106 for drug 1 + 2.
I find it strange that there aren’t any ‘significant’ gene sets obtained for the ‘drug 2 vs. no tx’ signature, given the large number of DEGs found. In fact, this signature shared more than 5000 significant DEGs (with same sign(logFC) direction) with the combo drug 1 + 2 treatment.
I have 2 questions: 1.) Is it possible to have no significant pathways despite having such a large fraction of the transcriptome being differentially expressed? If so, what conditions would account for that? 2.) I ranked my genes using -log10(adjusted p-value)/sign(logFC) (instead of raw p-value, as noted here), as I only had access to logFC and B-H adjusted p-values for each of the gene signatures. Is it appropriate to rank on adj. p-value? Could this account for the few enriched pathways observed for ‘drug 2 vs. no tx’?