Question: Gene set enrichment analysis using a curated gene list and cluster DE genes
0
gravatar for asmariyaz23
15 months ago by
asmariyaz2310
United States
asmariyaz2310 wrote:

I have a curated gene list using which I would like to carry out enrichment analysis on DE genes in clusters obtained using Seurat. I first tried to do this manually using Fisher Exact test like so:

No. genes in curated list: 5840 

No. DE genes in Cluster 0 (from Seurat): 512

No. Overlap genes: 209

No. Universe: 23,000

No. Untested: 23000 - (5631+209+303) = 16857

.

5840-209=5631
512-209=303

2X2 contingency table is designed as such:

209 5631
303 16857

The odds ratio looks off in this case so I am wondering if I designed the test correctly?

Secondly, I was trying to find a package (like fsgea) in R that would let me do this kind of analysis. My idea was to use all DE genes in each cluster to be fed as a custom pathway. But I am confused about the ranked list? What should that be? Unable to figure out where the curated gene list fit into the equation. Alternatively, is there a better approach to address this issue?

R rna-seq enrichment • 629 views
ADD COMMENTlink modified 15 months ago • written 15 months ago by asmariyaz2310

I will try it this way as well, just needed clarification on 2 variables N and k.

N = Are these the total number of genes in matrix (after initial filtration in a single cell package, in my case Seurat)?

k = Here do you refer to only the DE expressed genes in the cluster of interest or the total number of genes in the cluster?

Thank you again for your insight on this.

ADD REPLYlink written 15 months ago by asmariyaz2310

The odds ratio looks off in this case

Why do you think this ?

ADD REPLYlink written 15 months ago by Carlo Yague5.0k
1
gravatar for Jean-Karim Heriche
15 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

I think you're going about it the wrong way. If you want to know the probability of having the observed number or more curated genes in a cluster of DE genes, you can cast this as an urn problem. In the urn, you have N genes where N is the number of genes tested for differential expression, of these N genes, m are marked as curated and you draw k genes (the number of genes in the cluster of interest) out of which q are curated. So the probability of getting q or more curated genes in the cluster just by chance is given (in R) by phyper(q-1, m, N-m, k, lower.tail=FALSE)

ADD COMMENTlink written 15 months ago by Jean-Karim Heriche23k

The hypergeometric test (urn problem) is equivalent to the corresponding one-tailed version of Fisher's exact test. It is just a different way to think about the data, as it provides the same pvalue. See with the OP's data:

> fisher.test(matrix(c(209,5631,303,16857),2,2), alternative="g")$p.value
[1] 8.277633e-15
> phyper(209-1,303+209,16857+5631,209+5631, lower.tail=FALSE)
[1] 8.277633e-15
ADD REPLYlink modified 15 months ago • written 15 months ago by Carlo Yague5.0k
1

I know. I was trying to clarify things for the OP which seemed confused by the GSEA approach.

ADD REPLYlink written 15 months ago by Jean-Karim Heriche23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1160 users visited in the last hour