Question

Questions: how to do GSEA for single cells?

2

Entering edit mode

3.9 years ago

yingnanlei0202 ▴ 20

Hi, guys,

I would like to do the GSEA between two different clusters. I have no idea how to select the DEG for GSEA. I tried to use FindMarkers function(markers = FindMarkers(object = cluster, ident.1 = 0, only.pos = FALSE, verbose = T, logfc.threshold = 0, min.pct = 0,test.use = "wilcox")) and would like to get DEG of the whole gene lists. However, several hundred genes miss finally (from 19470-19074), and the pval of the top genes raking by logFC is 0. Could you please help me explaining this phenomenon? Do you have any recommendations for single cells GSEA. Thanks in advance!

RNA-Seq • 8.8k views

ADD COMMENT • link 3.9 years ago by yingnanlei0202 ▴ 20

0

Entering edit mode

When you have a large number of replicates, you can get extremely low p-values.

ADD REPLY • link 3.9 years ago by igor 13k

0

Entering edit mode

Hi, Thanks for you reply. the "replicates" you means refer to cell numbers? the pvalue=0 is valid?

ADD REPLY • link 3.9 years ago by yingnanlei0202 ▴ 20

0

Entering edit mode

Yes, replicates in this case are cell numbers. If you have a gene that for example is not expressed in one group of cells and notably expressed in another, and each group has 1000 cells than Wilcoxon p-values are tiny or even rounded to zero.

ADD REPLY • link 3.9 years ago by ATpoint 82k

0

Entering edit mode

ok! got it! thanks for your explanation. one more question: how did you get the different genes for the single cell GSEA(what kinds of methods)? And did you use all the gene lists or only pick highly expressed variances based on logFC and the value like pvalue? Thanks in advance!

ADD REPLY • link 3.9 years ago by yingnanlei0202 ▴ 20

0

Entering edit mode

I am not sure what you mean by single-cell GSEA. What you have are differential expression stats. Those are analogous to what you have with bulk RNA-seq. A fold change is a fold change either way.

ADD REPLY • link 3.9 years ago by igor 13k

0

Entering edit mode

For bulk RNA seq, I used the GSEA software to do gene set enrichment analysis, we need to input the expression dataset(the whole expression matrix not only the DEG) and phenotype labels these two files, as well as selected one of gene sets in msigdbr database(C1, C2……C8). For the single cell RNA seq, with my understanding, if we use R library to do GSEA, first, we need to prepare the DEG which consists of one column with the gene lists and one column with the value(logFC or pvalue), then rank it which will be taken as the states for library such as fgsea. Besides, we also need to prepare the gene set database, I selected the (msigdbr(species = "Homo sapiens", category = "C2")) as the pathways. My question is how many genes we should choose, filter by logFC and Pvalue or something else? Because we input the whole gene lists for bulk RNA in GSEA software. So I am confused about DEG selection. I am sorry, i am a beginner for RNA seq, thanks for your patient explanation!

ADD REPLY • link 3.9 years ago by yingnanlei0202 ▴ 20

0

Entering edit mode

GSEA can take a pre-ranked list. You don't need to filter it. There are a few different earlier discussions about it here.