Question: Estimating gene set enrichment using Fishers exact test
0
gravatar for Biologist
4 months ago by
Biologist150
Biologist150 wrote:

Hi,

I'm working with lung cancer data and I'm interested in lncRNAs, I would like to identify lncRNAs that target key pathways.

Recently, I read a paper which discusses about this type of analysis. Here is the paper Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context

In this Figure 4A is about lncRNAs that are predicted to target most pathways in MSigDB's Hallmark gene sets, which includes proliferation, immune response, signaling, and DNA damage pathways in multiple tumor types (PAN CANCER)

In the Methods section - Gene set enrichment

They mentioned like this.

When identifying lncRNAs whose targets are enriched in hallmark gene sets, we estimated gene set enrichment using Fisher’s Exact test between predicted lncRNA targets of each lncRNA and expressed gene set members in each of 14 tumor types using adjusted pFET < 0.01; each test was adjusted for the total number of lncRNAs, lncRNA targets, and gene set tested.

My question:

Usually, co-expression network analysis gives us the lncRNAs which fall in the module of protein coding genes and with that we could do pathway analysis. This way we can find which lncRNAs regulate which pathways.

But with the information mentioned in the paper's methods section - how to estimate gene set enrichment using fishers exact test between protein coding genes and lncRNAs?

Can anyone clear my confusion in this? Could you also please tell how this can be done?

thanq

lncrna gsea rna-seq fisherstest • 254 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by Biologist150
1

Without reading the paper but based on your citation, it seems clear (to me) that they take as the universe/background set all genes expressed in a given tumor and test whether genes identified as target of a given lncRNA are enriched in genes for a given pathway and consider significant only pathways with an adjusted p-value < 0.01. How the p-value is adjusted should be in the paper.

ADD REPLYlink written 4 months ago by Jean-Karim Heriche19k

Here in the paper they say "each test was adjusted for the total number of lncRNAs, lncRNA targets, and gene set tested".

I'm a bit confused how the contingency table should look for the fishers test in this type of analysis?

If you don't mind Could you please tell with a small example mentioning the number. thanq

ADD REPLYlink written 4 months ago by Biologist150
1

Assuming that in tumor A, 1000 genes are expressed and we're interested in lncRNA X for which we found 80 target genes and pathway P that comprises 13 genes, the contingency table would look like this:

            |  A  |  Not A |
---------------------------|
Targets of X|  10 |   70   |
---------------------------|
Other genes |   3 |   917  |

In R, this would be tested like this:

contingency.table <- matrix(c(10, 3, 70, 917), nrow = 2)
fisher.test(contingency.table)
ADD REPLYlink written 4 months ago by Jean-Karim Heriche19k

Thanks a ton @Jean-Karim Heriche

Here 1000 genes expressed meaning, differentially expressed in tumor or just expressed? And 80 target genes of lncRNA X mean co-expressed genes or neighbouring genes?

If it is co-expressed genes what should be the cutoff for selecting target genes?

ADD REPLYlink modified 4 months ago • written 4 months ago by Biologist150
1

This would be genes expressed in the tumor since only these have a chance of being detected as for the target genes, you'll have to read the paper if you want to know how they defined lncRNA target genes.

ADD REPLYlink written 4 months ago by Jean-Karim Heriche19k

Sure thanq. I will have a look.

ADD REPLYlink written 4 months ago by Biologist150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1641 users visited in the last hour