Question: Questions: how to do GSEA for single cells?
gravatar for yingnanlei0202
7 months ago by
yingnanlei02020 wrote:

Hi, guys,

I would like to do the GSEA between two different clusters. I have no idea how to select the DEG for GSEA. I tried to use FindMarkers function(markers = FindMarkers(object = cluster, ident.1 = 0, only.pos = FALSE, verbose = T, logfc.threshold = 0, min.pct = 0,test.use = "wilcox")) and would like to get DEG of the whole gene lists. However, several hundred genes miss finally (from 19470-19074), and the pval of the top genes raking by logFC is 0. Could you please help me explaining this phenomenon? Do you have any recommendations for single cells GSEA. Thanks in advance! DEG

rna-seq • 1.3k views
ADD COMMENTlink written 7 months ago by yingnanlei02020

When you have a large number of replicates, you can get extremely low p-values.

ADD REPLYlink written 7 months ago by igor12k

Hi, Thanks for you reply. the "replicates" you means refer to cell numbers? the pvalue=0 is valid?

ADD REPLYlink written 7 months ago by yingnanlei02020

Yes, replicates in this case are cell numbers. If you have a gene that for example is not expressed in one group of cells and notably expressed in another, and each group has 1000 cells than Wilcoxon p-values are tiny or even rounded to zero.

ADD REPLYlink written 7 months ago by ATpoint44k

ok! got it! thanks for your explanation. one more question: how did you get the different genes for the single cell GSEA(what kinds of methods)? And did you use all the gene lists or only pick highly expressed variances based on logFC and the value like pvalue? Thanks in advance!

ADD REPLYlink written 7 months ago by yingnanlei02020

I am not sure what you mean by single-cell GSEA. What you have are differential expression stats. Those are analogous to what you have with bulk RNA-seq. A fold change is a fold change either way.

ADD REPLYlink written 7 months ago by igor12k

For bulk RNA seq, I used the GSEA software to do gene set enrichment analysis, we need to input the expression dataset(the whole expression matrix not only the DEG) and phenotype labels these two files, as well as selected one of gene sets in msigdbr database(C1, C2……C8). For the single cell RNA seq, with my understanding, if we use R library to do GSEA, first, we need to prepare the DEG which consists of one column with the gene lists and one column with the value(logFC or pvalue), then rank it which will be taken as the states for library such as fgsea. Besides, we also need to prepare the gene set database, I selected the (msigdbr(species = "Homo sapiens", category = "C2")) as the pathways. My question is how many genes we should choose, filter by logFC and Pvalue or something else? Because we input the whole gene lists for bulk RNA in GSEA software. So I am confused about DEG selection. I am sorry, i am a beginner for RNA seq, thanks for your patient explanation!

ADD REPLYlink written 7 months ago by yingnanlei02020

GSEA can take a pre-ranked list. You don't need to filter it. There are a few different earlier discussions about it here.

ADD REPLYlink modified 7 months ago • written 7 months ago by igor12k

got it! Thanks for your patience!

ADD REPLYlink written 7 months ago by yingnanlei02020
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2113 users visited in the last hour