Question: Recommended tools for positional gene enrichment (PGE) analysis?
3.8 years ago by
I am looking for recommended methods that perform positional gene enrichment (PGE) analysis in human genomes. What I mean by PGE is to find out, for a given set of genes, if there is any significant positional clustering of these genes along the chromosome or genome.

I think this problem consists of two convoluted tasks: (1) identify positional gene clusters (2) test for the statistical significance of these clusters. Ideally, known biases (e.g. non-random gene density) and multiple testing correction should be accounted for.

Here is one example of a paper that addresses this problem:

De Preter et al. (2008): Positional gene enrichment analysis of gene sets for high-resolution identification of overrepresented chromosomal regions

I am looking for more recent, stand-alone implementations of such a method (e.g. an R package). A useful software would not necessarily test for positional enrichment of only genes, but could more generally work with any type of feature that has a genomic coordinate (e.g. take a list of known enhancers and identify "super enhancer" regions).

EDIT: From reading the above paper, I realized that the problem is not so much finding significantly enriched regions (this is as simple as performing a hypergeometric test), but to eliminate overlaps. Since I now found out that the authors provide their algorithm as Perl script on their home page, I will start with this method.

2.4 years ago by
Did you manage to find a stand-alone implementation in R. I am not so familiar with Perl but will also use the authours script if required.

What about any of the GSEA R packages? MSigDB has a positional gene set collection (C1):

Gene sets corresponding to each human chromosome and each cytogenetic band that has at least one gene. These gene sets are helpful in identifying effects related to chromosomal deletions or amplifications, dosage compensation, epigenetic silencing, and other regional effects.

I haven't looked for it, because the Perl script works well and I am still using it in my analysis pipelines (in addition to GSEA with MSigDB positional gene sets).

