Question

Difference in distribution of two sets of sites/positions along genome

0

Entering edit mode

8.6 years ago

dmiuso • 0

Hi

I have two sets of genome positions (sites) on mouse genome. One has about 13000 sites (let's call it background set), another has about 400 sites and it is a subset of the background set. I would like to check if distribution (density?) along the genome has local difference between two sets (13000 and 400).

I am very novice at R and this type of bioinformatics, so, would appreciate very much advice in both statistical test to apply as well as R package to be potentially used. Thanks!

R genome • 1.2k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by dmiuso • 0

0

Entering edit mode

Could you clarify what you're trying to do? If your 400 sites are a subset of 13000 sites, what do you expect to be different about them? If the 400 sites are a random sample from a population of 13000 then you don't expect any statistical difference. If the 400 sites are not a random sample from the 13000, how they were obtained/selected could tell you what's different between them and the others.

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks a lot for asking and trying to help, Jean-Karim!

I believe they can differ in density (local) along genome. The background of the story is following. We run Affymetrics Human methylation 450K kit on mouse genome. Alignment of these sites (probes) showed that about 13000 of them have 3 and fewer mismatches on mouse genome, so, we took them to work with. Out of these 13000 sites, about 400 turned out to be significantly hypomethylated in knock out mouse vs. wild type. I want to check if these 400 sites have the same distribution (density) along genome as distribution of "background" 13000 sites (which is by far not even itself). Dmitry

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by dmiuso • 0