Question

Overlap between 2 sets of genomic regions of differing size

1

Entering edit mode

11.1 years ago

Rubal ▴ 350

Hello Everybody,

I am aware that there are similar questions on testing for significant overlap between lists of genes or chip-seq data but I think that my question is different enough to warrant its own post.

I have two lists of genomic regions. One list contains regions of fixed size - all are 100kb. The other list has regions of variable size, ranging from a few to several megabases. I want to test if the overlap of these regions is more than is expected by chance. I am not sure on the best way to calculate this or if scripts/software already exist.

One could ask if there is any overlap at all between the lists (qualitative eg two regions either overlap or they do not), or the extent of the overlap between regions (quantitative eg two regions overlap by 3000bp). At the moment I am most interested in the former qualitative overlap and would like to know if it is more than expected by chance.

The best method I can think of at the moment is random permutations of regions of the same size. However I think these permutations would have to take place in a constrained space equal to the genome of the species I am testing otherwise the probability of overlap by chance cannot be correctly estimated. I think such a task is beyond my programming abilities so if someone can point out an existing script or suggest an alternative solution that would be most appreciated.

Thanks in advance!

Best,
Rubal

genome sequence windows next-gen • 5.1k views

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by Rubal ▴ 350

0

Entering edit mode

There's a good list of tools and a few detailed answers in this other question

ADD REPLY • link 5.8 years ago by bernatgel ★ 3.4k

Ram · Answer 1 · 2014-06-02

I strongly recommend GenometriCorr - an R package for spatial correlation of genome-wide interval datasets (PLOS Computational biology paper - Exploring Massive, Genome Scale Datasets with the GenometriCorr Package).

Input: lists of genomic regions + chromosome sizes

Output: statistics + visualizations

They also have example data and informative guide on their page.

And here is an example paper the used this package to test correlation between transposons and lncRNAs.