Question: Pathway analysis and Gene enrichment Analysis Queries
0
gravatar for bioinforesearchquestions
18 months ago by
United States
bioinforesearchquestions280 wrote:

Hi All,

I am working on the RNAseq samples. Planning to do pathway analysis and gene enrichment analysis. As of now don't have much background on these analyses. Currently doing some background research. If you people know some useful resources, kindly do share with me.

  • At first instance, why do we do pathway analysis and gene enrichment analysis?
  • Have a set of genes which are upregulated and down regulated between wild type and mutant, how to get enrichment score for upregulated genes and down regulated genes?
  • How to identify which pathways are enriched in the wild type and mutant samples?
  • How to identify which pathways are enriched in upregulated or downregulated genes?
ADD COMMENTlink modified 18 months ago by dz235390 • written 18 months ago by bioinforesearchquestions280
3
gravatar for dz2353
18 months ago by
dz235390
dz235390 wrote:

Hi, Maybe you can try this one: Metascape (web-based). For pathway analysis, I used IPA but is a commercial software.

ADD COMMENTlink written 18 months ago by dz235390
2
gravatar for Kevin Blighe
18 months ago by
Kevin Blighe60k
Kevin Blighe60k wrote:

At first instance, why do we do pathway analysis and gene enrichment analysis?

Sorry, please do your own background reading in order to understand this. Go to a search engine, type in keywords ncbi enrichment pathway analysis, and then start to read.

Have a set of genes which are upregulated and down regulated between wild type and mutant, how to get enrichment score for upregulated genes and down regulated genes?

Perform the enrichment separately, using the direction of fold-change to determine up- and down-regulation

How to identify which pathways are enriched in the wild type and mutant samples?

Different ways to do this. This could be the same as the answer that I gave in the previous point, or you could define a threshold Z-score for 'expressed' 'not expressed' (using the entire unfiltered dataset), and perform the enrichment and / or pathway analysis separately on those genes passing the threshold in wild type and, then, mutant.

How to identify which pathways are enriched in upregulated or downregulated genes?

Perform the pathway analysis separately, using the direction of fold-change to determine up- and down-regulation

-----------------------------

Some resources to get you started:

Kevin

ADD COMMENTlink modified 18 months ago • written 18 months ago by Kevin Blighe60k
2

Adding to the list,

Command-line based: Gene Set Clustering based on Functional annotation (GeneSCF)

ADD REPLYlink written 18 months ago by EagleEye6.6k
2

I'm also going to recommend my very recent answer to a similar question for why we do enrichment analyses and how they work.

Other resources include clusterProfiler (R) and enrichR (web-based and R).

ADD REPLYlink written 18 months ago by jared.andrews075.5k

Good answer on the other thread, jared - had not seen it. Thanks!

ADD REPLYlink written 18 months ago by Kevin Blighe60k

Hi Kevin, Sample1 - Mutant, Sample2 -Wildtype. As per the list given to me there are 680 genes in that cuffdiff output file. Just for understanding, when I took log2(Value_2/Value_1) -> Wildtype/Mutant, I got the same logFC as per the cuffdiff output. As you mentioned, I categorized the genes based on the log fold change now.

For 110 genes, the logFC values are positive and ranged between 1.02 to 4.8. So these genes are downregulated for mutant sample.
For 570 genes, the logFC values are negative and ranged between -9 to -1. So these genes are upregulated for mutant sample. Is my understanding correct?

I am planning to use GSEA. I have prepared three ranked gene list files (sorted logFC descending)

1) with 680 genes and their logFC values

2) with 570 genes and their logFC values for upregulated

3) with 110 genes and their logFC values for downregulated

Should I run GSEA separately on upregulated gene list and downregulated gene list or on total gene list?

ADD REPLYlink modified 18 months ago • written 18 months ago by bioinforesearchquestions280
1

I would likely run all three lists, as you can make different statements about each. For the full list, you can say that enriched pathways are perturbed or deregulated. Maybe the genes are split between up/down regulated. It still provides you something to hypothesize about, though actual effects would have to be measured more directly.

The up/down lists yield more direct observations. For instance, maybe many genes involved in calcium signaling are upregulated in the mutant, which might allow you to speculate something about the mutant phenotype. Perhaps something that could be easily experimentally validated.

Either way, running an additional list is easy, so there's no reason not to do all 3 sets.

ADD REPLYlink written 18 months ago by jared.andrews075.5k

Thanks, Jared. I have done GSEA on all three. But I was not sure which one is more meaningfull in interpreting.

For instance, when I did GSEA on upregulated gene list (570 genes). I selected this GENESET DATABASE "Mouse_GOBP_AllPathways_no_GO_iea_October_01_2018_symbol.gmt". GSEA finished successfully. As per the GSEA report for upregulated gene list, I could see

100/648 gene sets are upregulated in phenotype na_pos

42 gene sets are significant at FDR < 25%

32 gene sets are significantly enriched at nominal pvalue < 1%

548/648 gene sets are upregulated in phenotype na_neg

35 gene sets are significantly enriched at FDR < 25%

24 gene sets are significantly enriched at nominal pvalue < 1%

What is na_pos and na_neg? Is it mutant and wild type? How to know which is mutant and wild type?

How to interpret these values?

ADD REPLYlink modified 18 months ago • written 18 months ago by bioinforesearchquestions280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1540 users visited in the last hour