2.4 years ago by
random will create random locations of a particular length and
shuffle locations that will be length matched an input bed file. One downside of random is that strand locations will also be random, which means that if there is some strand bias on your experimentally derived data, you might get significant differences that are not there.
I want to use the end of genes
Will these be defined as regions X bp from transcription termination site, that is all have the same size? If not, and in view of the strand issues, my advise would be to use
shuffle. It will alleviate strand bias issues, and allow more control over what your control regions look like - that is they will be more matched to your locations.
Also, generate the control set multiple time (say 1000) to perform the comparison multiple times - you can then calculate the average and standard deviation of those 1000 permutations. This will ensure that the effect that you see (or not) is stable.