ClusterProfiler : What is GeneRatio and BgRatio?
6.6 years ago
ZheFrench ▴ 520

Question is in the title.

GeneRatio is like M/N where M is the number of genes from your input list that match the GO term. But I don't see what is N ?

BgRatio is like A/B where B is all genes in database but I'm not sure what A corresponds to ... Is it the number of genes specific in the database of this GO term ?

Tell me if I'm wrong. Thanks.

clusterProfiler • 34k views
5.3 years ago
molla.linda ▴ 180

I will give an example to explain this that helped me understand it. I also was looking for the answer and Guangchuang link helped.

Let is suppose I have a collection of genesets called : HALLMARK Now let is suppose there is a specific geneset there called: E2F_targets

BgRatio, M/N.

M = size of the geneset (eg size of the E2F_targets); (is the number of genes within that distribution that are annotated (either directly or indirectly) to the node of interest).

N = size of all of the unique genes in the collection of genesets (example the HALLMARK collection); (is the total number of genes in the background distribution (universe)

GeneRatio is k/n.

k = size of the overlap of 'a vector of gene id' you input with the specific geneset (eg E2F_targets), only unique genes; (the number of genes within that list n, which are annotated to the node.

n = size of the overlap of 'a vector of gene id' you input with all the members of the collection of genesets (eg the HALLMARK collection),only unique genes; is the size of the list of genes of interest

I'm a little confused about these terms.

When I;ve used the same gene set, why do my numbers of n and N change when doing gene ontology for different categories.

For example, for the same gene list for an overrepresentation test in Biological Processes for taxis GeneRatio is 209/3770 and BGRatio is 440/12553 but for Cellular Components for the term extracellular matrix, the Gene Ratio is 162/3963 and Bg Ratio is 339/13183. Shouldn't the n and N values stay the same in different GO categories?


Yeah I have the same problem. I don't really understand why the small n is changing then?

I am also struggling with the same problem (i.e. n and N are changing). Have you figured it out?

genes <- letters[1:15]
gs_df <- data.frame("gs_name"=c(rep("genesetX", 10), rep("genesetY", 25)),
                    "entrez_gene"=c(letters[1:10], letters[2:26]))
enricher(gene = genes, TERM2GENE = gs_df, minGSSize=1)@result

               ID Description GeneRatio BgRatio      pvalue    p.adjust       qvalue                      geneID Count
genesetX genesetX    genesetX     10/15   10/26 0.000565352 0.001130704 0.0005951074         a/b/c/d/e/f/g/h/i/j    10
genesetY genesetY    genesetY     14/15   25/26 1.000000000 1.000000000 0.5263157895 b/c/d/e/f/g/h/i/j/k/l/m/n/o    14

GeneRatio = k/n

  • k is the overlap between your genes-of-interest and the geneset
  • n is the number of all unique genes-of-interest


  • M is the number of genes within each geneset
  • N is the number of all unique genes across all genesets (universe)
13 months ago
sarahhp ▴ 20

Or perhaps in simpler terms GeneRatio = genes of interest in the gene set / total genes of interest. Most often I use it on lists of differentially expressed genes and so GeneRatio is also the fraction of differentially expressed genes found in the gene set.

I have struggled to find the right words to explain this to others, so I hope this helps!


