I would like to generate a random distribution of simulated CNVs to compare with my original data (chr, start, end position). I would like to maintain the same chromosome and in addition to keep fixed the length of each CNV only changing its start/end positions. I am wondering how it is the easiest and correct way to do that. I know the bedtools random and shuffle options but neither of them allowing me to keep fixed the size for each single CNV. Thank you very much for your help! Cristina
You can use the R/Bioconductor package regioneR for that. It has a function called
randomizeRegions that will move around your CNV regions maintaing their size and taking care that the randomized regions do not overlap. If you set the
per.chromosome parameter to TRUE then it will randomize your regions in their original chromosomes in the way you need them. By default it assumes you are working on human with genome version "hg19" and will get the correct chromosome sizes automatically, but
If you want to create the random regions to perform a permutation test, you can use the
permTest function in regioneR tha will take care of the whole process.
Important note: this function will not place any region on "masked" parts of the genome. By default it takes the default mask in the BSGenome package (includes centromeres, repeatmasker....) but in your case you should probably create a mask with only the centromere regions (code to get them below) or disable the mask completely with
library(karyoploteR) cytobands <- filterChromosomes(getCytobands("hg19")) centromeres <- cytobands[cytobands$gieStain=="acen"]