I have estrogen receptor(ER) binding locations as a bed file and I would like to prove that specific mutations are more likely to occur near these ER binding locations than any of the location on the genome. For this I have to create random locations across the whole genome. I found bedtools shuffle will prove me enough information however, I might need your guidance for this procedure in specific parts.
First, I have read in some google groups threads about generated beds dont have uniform distribution and have tendency to be biased for specific regions.
Second, for the inclusion and the exclusion part, should I consider gaps ? because in the bedtools shuffle wiki, it was suggested that only mappable locations must be included. I tried to find a bed file which specifies these locations but I couldnt find it. How necessary is this ?
my reference is GRCh37.
For these reasons, this I wanted to at least validate my process by your experience.