Create Random Bed File
        1 
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        Hello everyone,
Is there a rapid way to create a random BED file taking in consideration the genome version using MySQL access to UCSC ? No criteria here except the length of the fragments that we can call L
Thanks
Rad
                    
                 
                 
                
                
                    
                    
    
        
        
            ucsc
         
        
    
        
        
            bed
         
        
    
    
        • 3.3k views
    
 
                
                 
            
            
         
     
 
     
    
        
            
                
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        the table chromInfo  contains the length of the chromosomes. In the following query I only use another table (kgXref) to amplify the number of rows.
$ mysql -N --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e '
set @rank:=0;
select chrom,
 @start:=ROUND(RAND()*(size-100)),
 @start+ROUND(RAND()*100),
 concat("ID",@rank:=@rank+1),
 ROUND(RAND()*1000),
 IF(RAND()<0.5,"+","-")
from
 chromInfo, kgXref
limit 100'
+-----------------------+-----------+-----------+-------+-----+---+
|                  chr1 |  90205667 |  90205686 |   ID1 | 853 | - |
|                  chr2 | 228928707 | 228928768 |   ID2 | 212 | + |
|                  chr3 | 110648557 | 110648565 |   ID3 | 707 | + |
|                  chr4 |  78452996 |  78453009 |   ID4 | 432 | - |
|                  chr5 |  92801311 |  92801339 |   ID5 | 859 | + |
|                  chr6 | 119732169 | 119732182 |   ID6 | 566 | + |
|                  chr7 |  72222004 |  72222102 |   ID7 | 530 | - |
|                  chrX | 153234775 | 153234854 |   ID8 | 980 | - |
|                  chr8 | 108715435 | 108715445 |   ID9 | 290 | + |
|                  chr9 | 117733050 | 117733125 |  ID10 | 227 | - |
|                 chr10 | 108812567 | 108812599 |  ID11 | 214 | + |
|                 chr11 | 113708758 | 113708850 |  ID12 |  76 | - |
|                 chr12 | 115213268 | 115213313 |  ID13 | 672 | + |
|                 chr13 |   1854783 |   1854789 |  ID14 | 260 | + |
|                 chr14 |  86200594 |  86200660 |  ID15 | 913 | - |
|                 chr15 |  12437564 |  12437653 |  ID16 |  87 | - |
|                 chr16 |  51086633 |  51086686 |  ID17 | 958 | + |
|                 chr17 |   8681776 |   8681871 |  ID18 | 413 | + |
|                 chr18 |  68530992 |  68531064 |  ID19 | 963 | - |
|                 chr20 |  25333986 |  25333989 |  ID20 | 968 | - |
|                  chrY |  46155598 |  46155666 |  ID21 |  57 | + |
|                 chr19 |   4811141 |   4811207 |  ID22 |  35 | + |
|                 chr22 |  48293614 |  48293622 |  ID23 | 566 | - |
|                 chr21 |  13476195 |  13476256 |  ID24 | 227 | + |
|        chr6_ssto_hap7 |   3905893 |   3905901 |  ID25 |  15 | - |
|         chr6_mcf_hap5 |    742068 |    742093 |  ID26 | 782 | + |
|         chr6_cox_hap2 |   2243145 |   2243230 |  ID27 | 854 | - |
|        chr6_mann_hap4 |     62703 |     62795 |  ID28 | 570 | + |
|         chr6_apd_hap1 |   3244946 |   3244972 |  ID29 | 212 | + |
|         chr6_qbl_hap6 |   3292785 |   3292861 |  ID30 | 656 | - |
|         chr6_dbb_hap3 |    136354 |    136369 |  ID31 | 662 | - |
|       chr17_ctg5_hap1 |    524967 |    525065 |  ID32 | 976 | - |
|        chr4_ctg9_hap1 |    436008 |    436097 |  ID33 | 246 | - |
|  chr1_gl000192_random |     19603 |     19654 |  ID34 | 462 | - |
|        chrUn_gl000225 |     96553 |     96651 |  ID35 | 533 | - |
|  chr4_gl000194_random |       231 |       316 |  ID36 | 230 | - |
|  chr4_gl000193_random |     68718 |     68816 |  ID37 | 812 | + |
|  chr9_gl000200_random |     29815 |     29859 |  ID38 | 726 | + |
|        chrUn_gl000222 |     67855 |     67944 |  ID39 | 367 | + |
|        chrUn_gl000212 |    131049 |    131052 |  ID40 |  30 | + |
|  chr7_gl000195_random |     46631 |     46638 |  ID41 | 577 | - |
|        chrUn_gl000223 |    119311 |    119338 |  ID42 | 379 | + |
|        chrUn_gl000224 |     43641 |     43740 |  ID43 | 208 | + |
|        chrUn_gl000219 |    136897 |    136956 |  ID44 | 651 | + |
| chr17_gl000205_random |     86356 |     86357 |  ID45 | 542 | - |
|        chrUn_gl000215 |    145143 |    145156 |  ID46 | 115 | + |
|        chrUn_gl000216 |    104561 |    104608 |  ID47 | 504 | + |
|        chrUn_gl000217 |     17846 |     17861 |  ID48 | 438 | - |
|  chr9_gl000199_random |     66489 |     66562 |  ID49 | 498 | + |
|        chrUn_gl000211 |    154678 |    154757 |  ID50 | 176 | - |
|        chrUn_gl000213 |    162227 |    162270 |  ID51 | 191 | - |
|        chrUn_gl000220 |    119794 |    119866 |  ID52 | 354 | - |
|        chrUn_gl000218 |      7951 |      7989 |  ID53 | 756 | - |
| chr19_gl000209_random |    147343 |    147414 |  ID54 | 792 | - |
|        chrUn_gl000221 |    111365 |    111378 |  ID55 | 494 | + |
|        chrUn_gl000214 |    129148 |    129192 |  ID56 | 382 | - |
|        chrUn_gl000228 |    103840 |    103865 |  ID57 | 857 | - |
|        chrUn_gl000227 |      5598 |      5663 |  ID58 | 114 | - |
|  chr1_gl000191_random |     82631 |     82632 |  ID59 | 737 | - |
| chr19_gl000208_random |       801 |       812 |  ID60 | 532 | + |
|  chr9_gl000198_random |      1986 |      2000 |  ID61 | 635 | - |
| chr17_gl000204_random |     70716 |     70725 |  ID62 | 828 | - |
|        chrUn_gl000233 |     40796 |     40879 |  ID63 | 459 | - |
|        chrUn_gl000237 |     32975 |     32989 |  ID64 | 554 | + |
|        chrUn_gl000230 |      1828 |      1847 |  ID65 | 811 | + |
|        chrUn_gl000242 |       969 |      1033 |  ID66 | 130 | - |
|        chrUn_gl000243 |     11318 |     11330 |  ID67 | 807 | - |
|        chrUn_gl000241 |     40739 |     40820 |  ID68 | 148 | + |
|        chrUn_gl000236 |      3592 |      3643 |  ID69 | 303 | - |
|        chrUn_gl000240 |     40872 |     40967 |  ID70 | 840 | + |
| chr17_gl000206_random |      6964 |      7048 |  ID71 | 673 | - |
|        chrUn_gl000232 |     10439 |     10511 |  ID72 | 833 | + |
|        chrUn_gl000234 |     20523 |     20576 |  ID73 | 149 | + |
| chr11_gl000202_random |     10214 |     10300 |  ID74 | 514 | + |
|        chrUn_gl000238 |     18495 |     18527 |  ID75 | 197 | + |
|        chrUn_gl000244 |     22151 |     22220 |  ID76 | 798 | - |
|        chrUn_gl000248 |      6317 |      6323 |  ID77 | 841 | + |
|  chr8_gl000196_random |     21150 |     21218 |  ID78 | 785 | - |
|        chrUn_gl000249 |       440 |       484 |  ID79 | 152 | + |
|        chrUn_gl000246 |     30097 |     30158 |  ID80 | 656 | + |
| chr17_gl000203_random |     13202 |     13239 |  ID81 | 800 | - |
|  chr8_gl000197_random |       630 |       674 |  ID82 | 126 | + |
|        chrUn_gl000245 |      8695 |      8717 |  ID83 | 387 | + |
|        chrUn_gl000247 |      8035 |      8063 |  ID84 | 716 | - |
|  chr9_gl000201_random |     21819 |     21897 |  ID85 |  60 | - |
|        chrUn_gl000235 |     23868 |     23923 |  ID86 | 652 | - |
|        chrUn_gl000239 |      4778 |      4863 |  ID87 | 823 | - |
| chr21_gl000210_random |      9750 |      9757 |  ID88 | 313 | + |
|        chrUn_gl000231 |     21112 |     21196 |  ID89 | 880 | - |
|        chrUn_gl000229 |     14957 |     14971 |  ID90 | 424 | - |
|                  chrM |      4308 |      4327 |  ID91 | 154 | + |
|        chrUn_gl000226 |      8416 |      8437 |  ID92 | 344 | + |
| chr18_gl000207_random |      1887 |      1985 |  ID93 | 518 | - |
|                  chr1 | 189063477 | 189063558 |  ID94 | 752 | + |
|                  chr2 | 111058172 | 111058198 |  ID95 | 911 | - |
|                  chr3 |  38005235 |  38005296 |  ID96 | 451 | + |
|                  chr4 | 160737381 | 160737470 |  ID97 | 918 | - |
|                  chr5 | 158371323 | 158371383 |  ID98 | 370 | + |
|                  chr6 |  25264328 |  25264386 |  ID99 | 477 | - |
|                  chr7 | 117089472 | 117089550 | ID100 | 686 | + |
+-----------------------+-----------+-----------+-------+-----+---+
 
                 
                
                
                 
            
            
         
     
 
         
        
 
    
    
        
            
                  before adding your answer.
         
    
    
         
        
            
        
     
    
    Traffic: 4665 users visited in the last hour
         
    
    
        
    
    
 
Nice solution, Pierre. Just curious, does mysql have a generator function (like Oracle and others) such that you could create an arbitrary number of random intervals (and thus not have to join to a table like kgXref with a limited number of rows)?
To generate a correct half-open BED file (i.e., to avoid the random possibility that the start and stop coordinates are equal), you might add a
1to the stop coordinate, e.g.:@start+ROUND(RAND()*100)+1. Otherwise, this will occasionally generate incorrect BED elements.@Aaron Quick answer: I don't know.
Thx Pierre, brilliant solution like usual