Question

ClusterProfiler : What is GeneRatio and BgRatio?

15

Entering edit mode

7.7 years ago

ZheFrench ▴ 580

Question is in the title.

GeneRatio is like M/N where M is the number of genes from your input list that match the GO term. But I don't see what is N ?

BgRatio is like A/B where B is all genes in database but I'm not sure what A corresponds to ... Is it the number of genes specific in the database of this GO term ?

Tell me if I'm wrong. Thanks.

clusterProfiler • 46k views

ADD COMMENT • link updated 5 months ago by Picasa ▴ 650 • written 7.7 years ago by ZheFrench ▴ 580

score 27 · Answer 1 · 2018-03-01

I will give an example to explain this that helped me understand it. I also was looking for the answer and Guangchuang link helped.

Let is suppose I have a collection of genesets called : HALLMARK Now let is suppose there is a specific geneset there called: E2F_targets

BgRatio, M/N.

M = size of the geneset (eg size of the E2F_targets); (is the number of genes within that distribution that are annotated (either directly or indirectly) to the node of interest).

N = size of all of the unique genes in the collection of genesets (example the HALLMARK collection); (is the total number of genes in the background distribution (universe)

GeneRatio is k/n.

k = size of the overlap of 'a vector of gene id' you input with the specific geneset (eg E2F_targets), only unique genes; (the number of genes within that list n, which are annotated to the node.

n = size of the overlap of 'a vector of gene id' you input with all the members of the collection of genesets (eg the HALLMARK collection),only unique genes; is the size of the list of genes of interest

score 4 · Answer 2 · 2016-11-06

4

Entering edit mode

7.7 years ago

Guangchuang Yu ★ 2.6k

see https://bioconductor.org/packages/release/bioc/vignettes/DOSE/inst/doc/enrichmentAnalysis.html#over-representation-analysis

Corresponding to the formula, geneRatio is k/n.

ADD COMMENT • link 7.7 years ago by Guangchuang Yu ★ 2.6k

1

Entering edit mode

I'm a little confused about these terms.

When I;ve used the same gene set, why do my numbers of n and N change when doing gene ontology for different categories.

For example, for the same gene list for an overrepresentation test in Biological Processes for taxis GeneRatio is 209/3770 and BGRatio is 440/12553 but for Cellular Components for the term extracellular matrix, the Gene Ratio is 162/3963 and Bg Ratio is 339/13183. Shouldn't the n and N values stay the same in different GO categories?

Cheers

ADD REPLY • link 5.2 years ago by unawaz ▴ 60

0

Entering edit mode

Yeah I have the same problem. I don't really understand why the small n is changing then?

ADD REPLY • link 2.2 years ago by Arend • 0

0

Entering edit mode

I am also struggling with the same problem (i.e. n and N are changing). Have you figured it out?

ADD REPLY • link 2.2 years ago by yatzutzu • 0

0

Entering edit mode

genes <- letters[1:15]
gs_df <- data.frame("gs_name"=c(rep("genesetX", 10), rep("genesetY", 25)),
                    "entrez_gene"=c(letters[1:10], letters[2:26]))
enricher(gene = genes, TERM2GENE = gs_df, minGSSize=1)@result

               ID Description GeneRatio BgRatio      pvalue    p.adjust       qvalue                      geneID Count
genesetX genesetX    genesetX     10/15   10/26 0.000565352 0.001130704 0.0005951074         a/b/c/d/e/f/g/h/i/j    10
genesetY genesetY    genesetY     14/15   25/26 1.000000000 1.000000000 0.5263157895 b/c/d/e/f/g/h/i/j/k/l/m/n/o    14

GeneRatio = k/n

k is the overlap between your genes-of-interest and the geneset
n is the number of all unique genes-of-interest

BgRatio=M/N

M is the number of genes within each geneset
N is the number of all unique genes across all genesets (universe)

ADD REPLY • link 16 months ago by Rene ▴ 10

0

Entering edit mode

The link is broken, but the content was archived by the Wayback Machine: https://web.archive.org/web/20171111072829/https://bioconductor.org/packages/release/bioc/vignettes/DOSE/inst/doc/enrichmentAnalysis.html#over-representation-analysis

Or better yet, the same info at the clusterProfiler book: http://yulab-smu.top/clusterProfiler-book/chapter2.html#over-representation-analysis

ADD REPLY • link 3.2 years ago by JorgeVallejo ▴ 20

score 4 · Answer 3 · 2022-04-19

4

Entering edit mode

2.3 years ago

sarahhp ▴ 40

Or perhaps in simpler terms GeneRatio = genes of interest in the gene set / total genes of interest. Most often I use it on lists of differentially expressed genes and so GeneRatio is also the fraction of differentially expressed genes found in the gene set.

I have struggled to find the right words to explain this to others, so I hope this helps!