Question: Assessing significance of protein binding data inside defined genomic intervals
gravatar for Sakti
5.8 years ago by
United States
Sakti400 wrote:

Dear friends,

Once again I am back to consult your wisdom. Very recently I obtained a list of regions inside mouse chromosome 7 which are contacting a specific nuclear body (sorry, cannot give more details about it). Several proteins overlap these regions (i.e. cohesin). However, I would like to know how significant these overlap ratios are compared to a randomly chosen region set (which has the same length characteristics as my original nuclear body dataset).

Does anyone know any tool one could use to perform this analysis? I found the R package named coocur but this analyzes protein binding sites co-occurrence, which I think is a little different from what I'm trying to do.

Also, in case such program does not exist, what would be the best way to proceed in terms of statistical tests? I was thinking on writing a script that chooses regions randomly with the same length as my nuclear dataset, calculating overlaps, and then comparing such ratios with my nuclear body ratios. But then I think maybe boostrapping is also necessary, but I'm not sure what statistical test should I use in that case.

I'd appreciate any insight you may provide.



ADD COMMENTlink modified 5.2 years ago by Biostar ♦♦ 20 • written 5.8 years ago by Sakti400

nuclear body = sparse term ? can you be a bit more specific ? 
transcription factor, enhancer etc. ? 

ADD REPLYlink written 5.8 years ago by Khader Shameer18k

I think the answers for this question is what you are looking for: Сalculating fold-enrichment of ChIP-seq peaks intersecting with promoters (vs. genome average)

ADD REPLYlink written 5.2 years ago by Fidel1.9k
gravatar for Michael Dondrup
5.7 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

An experiment of drawing random genomic positions with two outcomes - overlap with a gene (success) or no overlap (fail) -  is a Bernoulli trial with success probability C/G (C= #of bases in genes, vs. total # of bases in the Genome). Therefore the Binomial distribution is suitable to calculate the cummulative distribution function for a certain number of N or more successes in M trials. This doesn't depend on how your genomic location is selected. 

ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by Michael Dondrup47k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1082 users visited in the last hour