GSEA for RNA_seq dataset
3
0
Entering edit mode
14 months ago
1961012 ▴ 10

I have RNA-seq count data and I have already identifying differentially expressed genes DEGs. and Protein-protein interaction analysis for those DEGs. I would like to perform GSEA for comparison with the previous analysis. and i am confused about what should i do

1- should I take all genes in the RNA-seq count data for GSEA.

2- should I take only DEGs or GSEA.

3- should I take DEGs at the same cut-off (p-value) that used to consider the DEGs as input for PPI or is there any favorable cut-off

thank you in advance!

DEGs GSEA RNA_seq • 975 views
2
Entering edit mode
14 months ago
ATpoint 62k

I tried to explain my understanding of (f)GSEA and why one should use all (expressed) genes here over at this Bioc post:

https://support.bioconductor.org/p/9135326/#9135328

(...) You should use all genes, or at least all relevant genes. In DESeq2 that might be the genes surviving the independent filtering (=not being NA) or in edgeR those that survive filterByExpr. GSEA tests whether a gene set as a whole (rather than individual genes as we test in a pairwise comparison with the mentioned tools) show evidence to be over- or underexpressed. A geneset can (as a whole) show evidence to be overexpressed even though each gene individually does not need to be overexpressed (=being significant) in a pairwise comparison. It is simply two different types of questions one asks when using pairwise DE testing and GSEA. For DESeq2 I would therefore use all genes surviving the independent filtering, e.g. ranked by moderated and shrunken LFC after applying lfcShrink. As we rank genes for GSEA we obviously lose the information of the magnitude of the ranking metric (here the fold changes) so GSEA informs about global tendencies. I think it makes sense to always pair GSEA results with other information, like the fold changes from DESeq2. Even if your GSEA is significant, but it turns out that the fold changes of your DESeq2 analysis for the genes of that particular pathway you are fgsea-ing against are tiny (like very close to zero), then it is probably questionable whether the result is biologically meaningful, even though in GSEA rank space the analysis was significant. But I think the practice of combining different analysis methods to make a confident statements always makes sense, not just in the GSEA context. Does that make sense to you?

0
Entering edit mode

I am really sorry for asking for clarification because I am a beginner.

I already used all genes (normalized and filtered) for GSEA using the program GSEA version:4.1.0. but before LFC shrinkage. is that correct?

DO you mean that I should rank the genes In pic (normalized and filtered) according to shrunk LFC and then using them as input for GSEA ??

0
Entering edit mode

No problem. I am not really familiar with what the GSEA implementation from the Broad institute does. I personally use fgsea from Bioconductor, and for this I rank the genes by shrunken logFC. It is on you what you use for ranking. I do not know how the Broad GSEA works and what kind of input it expects.

1
Entering edit mode
14 months ago

You should read up about GSEA, it sounds like you don't have a good grasp of what the process involves, which could lead you to misinterpret the results. The original paper gives a good overview of the theory, and this page gives some good tips on providing a rank statistic.

In short though, you should use all genes.

1
Entering edit mode
14 months ago
Zhilong Jia ★ 2.1k

Using all genes with signed P-value to rank genes, where the sign is from LogFC. and GSEAPreranked module in GSEA.