Question: How to use DEGs file for GSEA?
0
gravatar for biostarukha
8 weeks ago by
biostarukha10
biostarukha10 wrote:

I want to run GSEA on my DEGs from scRNA-seq analysis, which contains gene name, logFC, p-value, adjusted p-value data. However, in the Broad Institute GSEA tutorial on how to format input files, their file contains gene expression across multiple samples but not DEGs.

Is there any way to use DEGs input in GSEA software? If not, are there any other gsea tools that can calculate enrichment scores using DEGs data?

seurat rna-seq gsea scrna-seq • 270 views
ADD COMMENTlink modified 8 weeks ago by rpolicastro3.9k • written 8 weeks ago by biostarukha10
1
gravatar for rpolicastro
8 weeks ago by
rpolicastro3.9k
Bloomington, IN
rpolicastro3.9k wrote:

GSEA generally requires a numeric value for all genes, because it relies on the relative rank of genes in a term versus all other genes in the dataset for its calculation. It would be better to return the log2 FC of all genes and not filter by a fold change or adjust p-value threshold. Alternatively you could perform a regular (hypergeometric-like) enrichment analysis with your DEGs and term database of choice.

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by rpolicastro3.9k

Thank you for your comment. If I use log2FC of the DEGs without p-value restriction, how can I ensure the significance of the results? There are some high logFC values in my data that also have a very high adj p-value. Also, how do I impute values for those DEGs that absent in some clusters?

ADD REPLYlink written 7 weeks ago by biostarukha10

You can't ensure the significance of the results, because manipulating the data to any appreciable extent is violating the assumptions of the test. As long as you are not filtering the data by p-value or log FC, don't worry if clusters are missing some genes due to having low or no expression. If you feel your log FC or adjust p-value threshold are critical, you should perhaps switch to an overrepresentation test.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by rpolicastro3.9k

thank you, I understand that now. So, I think I should use average expressions of all genes across clusters and treat them as if they were samples in bulk RNA GSEA. If I understand correctly, in bulk GSEA there should be disease and normal samples. But in scRNA-seq cluster marker genes are usually found by comparing the expression of a gene in a cluster of interest vs in all remaining cells. How should I choose 'phenotypes' in this case?

ADD REPLYlink written 7 weeks ago by biostarukha10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1004 users visited in the last hour
_