Question

GO analysis: Why did gene set with high cluster frequency from up-regulation category will disappear from GO analysis of all DEGs ?

0

Entering edit mode

3.1 years ago

greyman ▴ 190

To further clarify, I did two GO analysis: (i) up, (ii)down, and (ii) all DEGs. The output from up regulation category showed 120 gene sets, down regulation showed 5 gene sets and for all DEGs show 40 gene sets. I can see that GO from down-regulation category with high number of genes ( >10 per set) has been absorbed into "all DEGs", however, not all GO from up-regulated DEG has been included into the "all DEGs" category, despite having >60 genes (high cluster frequency) in some of the sets.

From previous posts I see others comment about the stringency of all DEGs category will be higher than separating into up- and down- category, however, it doesn't explain why gene sets with high cluster frequency(20%) and small q-value (1.32E-02 ) from up-regulation category does not showed up when I did GO with all DEGs?

Thank you very much for your time.

GO GSEA RNA-Seq sequencing • 794 views

ADD COMMENT • link 3.1 years ago by greyman ▴ 190

score 1 · Accepted Answer · 2021-03-16

This is just a consequence of the hypergeometric test. The test considers the total number of genes, the number of genes in your DEG list, the number of genes in the GO term, and the number of overlapping genes between the DEG and GO sets. You are changing the total DEGs and set overlap values when you use the total set versus the split sets, so different results are usually expected.

You could have an instance where, for example, 10 out of 100 upregulated genes are in a GO term that shows as significant. However, when you add downregulated genes it could only be 10 out of 250 genes, and the term no longer is upregulated.