I wonder if there is any tool or script can generate random non overlapped bed co-ordinaes compare to given input bed co-ordinates.
~Chirag
I wonder if there is any tool or script can generate random non overlapped bed co-ordinaes compare to given input bed co-ordinates.
~Chirag
Hi Chirag,
In order to have a non-overlapping set, you can use bedtools subtractBed and the corresponding chromosome sizes. You'll get a bed file which consists of the chromosome minus the input. From this, you can use for instance R to sample smaller chunks.
Cheers,
Michael
Set your build of interest:
$ BUILD="hg19"
$ echo ${BUILD}
hg19
Set the number of elements you want to sample from ${BUILD}:
$ ELEMENTS=1234
$ echo ${ELEMENTS}
1234
Then sample with mysql, sort the BED data with sort-bed, map with bedmap to count the number of overlapping elements, use awk to filter for elements that only overlap themselves, use cut to strip the first column, and then write the results to a new BED file called random.bed:
$ mysql -N --user=genome --host=genome-mysql.cse.ucsc.edu -A -D ${BUILD} -e 'SET @rank:=0; SELECT DISTINCT chrom as chromcol, @start:=ROUND(RAND()*(size-100)) as startcol, @start+ROUND(RAND()*100)+1 as stopcol, CONCAT("id-",@rank:=@rank+1) as idcol, ROUND(RAND()*1000) as scorecol, IF(RAND()<0.5,"+","-") asstrandcol FROM chromInfo, kgXref LIMIT ${ELEMENTS}' | sort-bed - | bedmap --count --echo --delim '\t' - | awk '$1==1' - | cut -f2- - > random.bed
This will generate a subset of ELEMENTS number of BED elements, which are between 1 and 100 bases long, within the chromosomal boundaries of the genome build BUILD, where they are non-overlapping. 
You can adjust that size parameter, depending on the region size distribution you need for randomly-sampled elements. You'll get a subset, because not all elements may be disjoint.
Once you have random.bed, you can apply set operations against regions of interest with bedops, etc.
Random genome fragments in RSA tools can do this. It won't exclude overlapping regions though so you'll have to filter out afterwards.
If I understood the question correctly, Bedtools shuffle should do exactly that.
cat A.bed
chr1  0  100  a1  1  +
chr1  0  1000 a2  2  -
cat my.genome
chr1  10000
chr2  8000
chr3  5000
chr4  2000
bedtools shuffle -i A.bed -g my.genome
chr4  1498  1598  a1  1  +
chr3  2156  3156  a2  2  -
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks Michael,
Worked very well.
Cheers, Chirag