Question: Gene Set Size - When Is It Too Small?
7.1 years ago by
PoGibas4.8k wrote:

I have small subset of genes that have specific characteristic (e.g., TFBS in their UTRs). Checked enrichment in all set using permutation test (p value = 0). However, only small subset of genes have this TFBS and I don't know is it worth analyzing (e.g., expression, conservation) these genes as set is very small.


Total number of genes in set = 20000
Number of genes with TFBS = 8
Permutation test p value = 0 (aka, all set (20000 genes) is enriched for this TFBS compared to a genomic background)


How to determine if set size is statistically valid (8 genes out of 20000)? Any test in R?
Is it worth analyzing such a small set of genes and try to show how interesting and important is their biology?

7.1 years ago by
Charles Warden8.0k
Duarte, CA
Charles Warden8.0k wrote:

Instead of a permutation test, a Fisher exact test or hypergeometric test is more commonly used to calculate gene set enrichment.

When doing something like GO enrichment (which should use a similar principle), I don't set a hard cutoff for number of genes in the original gene set (in GO), but I typically like to see highly significant values (such a p<1e-5) that should typically include multiple enriched genes within the deferentially expressed gene list (similar to your 2000 gene list, I assume). However, I return the entire list of results p<0.05. Sometimes biologists like to know if a single gene is affected (if that single gene is known to be really important).

BTW, you can try using the TRANFAC enrichment tool in GATHER if you have a list of official gene symbols:

I personally like the upstream regulator function in IPA (based upon literature annotations rather than predicted motifs), but that is commercial software.

