I have RNA-seq count data and I have already identifying differentially expressed genes DEGs. and Protein-protein interaction analysis for those DEGs.
I would like to perform GSEA for comparison with the previous analysis. and i am confused about what should i do
1- should I take all genes in the RNA-seq count data for GSEA.
2- should I take only DEGs or GSEA.
3- should I take DEGs at the same cut-off (p-value) that used to consider the DEGs as input for PPI or is there any favorable cut-off
(...) You should use all genes, or at least all relevant genes. In
DESeq2 that might be the genes surviving the independent filtering
(=not being NA) or in edgeR those that survive filterByExpr. GSEA
tests whether a gene set as a whole (rather than individual genes as
we test in a pairwise comparison with the mentioned tools) show
evidence to be over- or underexpressed. A geneset can (as a whole)
show evidence to be overexpressed even though each gene individually
does not need to be overexpressed (=being significant) in a pairwise
comparison. It is simply two different types of questions one asks
when using pairwise DE testing and GSEA. For DESeq2 I would therefore
use all genes surviving the independent filtering, e.g. ranked by
moderated and shrunken LFC after applying lfcShrink. As we rank genes
for GSEA we obviously lose the information of the magnitude of the
ranking metric (here the fold changes) so GSEA informs about global
tendencies. I think it makes sense to always pair GSEA results with
other information, like the fold changes from DESeq2. Even if your
GSEA is significant, but it turns out that the fold changes of your
DESeq2 analysis for the genes of that particular pathway you are
fgsea-ing against are tiny (like very close to zero), then it is
probably questionable whether the result is biologically meaningful,
even though in GSEA rank space the analysis was significant. But I
think the practice of combining different analysis methods to make a
confident statements always makes sense, not just in the GSEA context.
Does that make sense to you?
You should read up about GSEA, it sounds like you don't have a good grasp of what the process involves, which could lead you to misinterpret the results. The original paper gives a good overview of the theory, and this page gives some good tips on providing a rank statistic.