Question: Overlap between 2 sets of genomic regions of differing size
0
gravatar for Rubal
5.1 years ago by
Rubal240
Germany
Rubal240 wrote:

Hello Everybody,

I am aware that there are similar questions on testing for significant overlap between lists of genes or chip-seq data but I think that my question is different enough to warrant its own post. 

I have two lists of genomic regions. One list contains regions of fixed size - all are 100kb. The other list has regions of variable size, ranging from a few to several megabases. I want to test if the overlap of these regions is more than is expected by chance. I am not sure on the best way to calculate this or if scripts/software already exist.

One could ask if there is any overlap at all between the lists (qualitative eg two regions either overlap or they do not), or the extent of the overlap between regions (quantitative eg two regions overlap by 3000bp). At the moment I am most interested in the former qualitative overlap and would like to know if it is more than expected by chance. 

The best method I can think of at the moment is random permutations of regions of the same size. However I think these permutations would have to take place in a constrained space equal to the genome of the species I am testing otherwise the probability of overlap by chance cannot be correctly estimated. I think such a task is beyond my programming abilities so if someone can point out an existing script or suggest an alternative solution that would be most appreciated.

Thanks in advance!

Best,

Rubal

windows next-gen sequence genome • 2.6k views
ADD COMMENTlink modified 5.1 years ago by PoGibas4.8k • written 5.1 years ago by Rubal240
3
gravatar for PoGibas
5.1 years ago by
PoGibas4.8k
Vilnius
PoGibas4.8k wrote:

I strongly recommend GenometriCorr - an R package for spatial correlation of genome-wide interval datasets (PLOS Computational biology paper - Exploring Massive, Genome Scale Datasets with the GenometriCorr Package).

Input: lists of genomic regions + chromosome sizes

Output: statistics + visualizations  

 

They also have example data and informative guide on their page.

And here is an example paper the used this package to test correlation between transposons and lncRNAs.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by PoGibas4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 626 users visited in the last hour