Create Random Bed File
1
3
Entering edit mode
12.3 years ago

Hello everyone,

Is there a rapid way to create a random BED file taking in consideration the genome version using MySQL access to UCSC ? No criteria here except the length of the fragments that we can call L

Thanks

Rad

ucsc bed • 2.8k views
ADD COMMENT
8
Entering edit mode
12.3 years ago

the table chromInfo contains the length of the chromosomes. In the following query I only use another table (kgXref) to amplify the number of rows.

$ mysql -N --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e '
set @rank:=0;
select chrom,
 @start:=ROUND(RAND()*(size-100)),
 @start+ROUND(RAND()*100),
 concat("ID",@rank:=@rank+1),
 ROUND(RAND()*1000),
 IF(RAND()<0.5,"+","-")
from
 chromInfo, kgXref
limit 100'

+-----------------------+-----------+-----------+-------+-----+---+
|                  chr1 |  90205667 |  90205686 |   ID1 | 853 | - |
|                  chr2 | 228928707 | 228928768 |   ID2 | 212 | + |
|                  chr3 | 110648557 | 110648565 |   ID3 | 707 | + |
|                  chr4 |  78452996 |  78453009 |   ID4 | 432 | - |
|                  chr5 |  92801311 |  92801339 |   ID5 | 859 | + |
|                  chr6 | 119732169 | 119732182 |   ID6 | 566 | + |
|                  chr7 |  72222004 |  72222102 |   ID7 | 530 | - |
|                  chrX | 153234775 | 153234854 |   ID8 | 980 | - |
|                  chr8 | 108715435 | 108715445 |   ID9 | 290 | + |
|                  chr9 | 117733050 | 117733125 |  ID10 | 227 | - |
|                 chr10 | 108812567 | 108812599 |  ID11 | 214 | + |
|                 chr11 | 113708758 | 113708850 |  ID12 |  76 | - |
|                 chr12 | 115213268 | 115213313 |  ID13 | 672 | + |
|                 chr13 |   1854783 |   1854789 |  ID14 | 260 | + |
|                 chr14 |  86200594 |  86200660 |  ID15 | 913 | - |
|                 chr15 |  12437564 |  12437653 |  ID16 |  87 | - |
|                 chr16 |  51086633 |  51086686 |  ID17 | 958 | + |
|                 chr17 |   8681776 |   8681871 |  ID18 | 413 | + |
|                 chr18 |  68530992 |  68531064 |  ID19 | 963 | - |
|                 chr20 |  25333986 |  25333989 |  ID20 | 968 | - |
|                  chrY |  46155598 |  46155666 |  ID21 |  57 | + |
|                 chr19 |   4811141 |   4811207 |  ID22 |  35 | + |
|                 chr22 |  48293614 |  48293622 |  ID23 | 566 | - |
|                 chr21 |  13476195 |  13476256 |  ID24 | 227 | + |
|        chr6_ssto_hap7 |   3905893 |   3905901 |  ID25 |  15 | - |
|         chr6_mcf_hap5 |    742068 |    742093 |  ID26 | 782 | + |
|         chr6_cox_hap2 |   2243145 |   2243230 |  ID27 | 854 | - |
|        chr6_mann_hap4 |     62703 |     62795 |  ID28 | 570 | + |
|         chr6_apd_hap1 |   3244946 |   3244972 |  ID29 | 212 | + |
|         chr6_qbl_hap6 |   3292785 |   3292861 |  ID30 | 656 | - |
|         chr6_dbb_hap3 |    136354 |    136369 |  ID31 | 662 | - |
|       chr17_ctg5_hap1 |    524967 |    525065 |  ID32 | 976 | - |
|        chr4_ctg9_hap1 |    436008 |    436097 |  ID33 | 246 | - |
|  chr1_gl000192_random |     19603 |     19654 |  ID34 | 462 | - |
|        chrUn_gl000225 |     96553 |     96651 |  ID35 | 533 | - |
|  chr4_gl000194_random |       231 |       316 |  ID36 | 230 | - |
|  chr4_gl000193_random |     68718 |     68816 |  ID37 | 812 | + |
|  chr9_gl000200_random |     29815 |     29859 |  ID38 | 726 | + |
|        chrUn_gl000222 |     67855 |     67944 |  ID39 | 367 | + |
|        chrUn_gl000212 |    131049 |    131052 |  ID40 |  30 | + |
|  chr7_gl000195_random |     46631 |     46638 |  ID41 | 577 | - |
|        chrUn_gl000223 |    119311 |    119338 |  ID42 | 379 | + |
|        chrUn_gl000224 |     43641 |     43740 |  ID43 | 208 | + |
|        chrUn_gl000219 |    136897 |    136956 |  ID44 | 651 | + |
| chr17_gl000205_random |     86356 |     86357 |  ID45 | 542 | - |
|        chrUn_gl000215 |    145143 |    145156 |  ID46 | 115 | + |
|        chrUn_gl000216 |    104561 |    104608 |  ID47 | 504 | + |
|        chrUn_gl000217 |     17846 |     17861 |  ID48 | 438 | - |
|  chr9_gl000199_random |     66489 |     66562 |  ID49 | 498 | + |
|        chrUn_gl000211 |    154678 |    154757 |  ID50 | 176 | - |
|        chrUn_gl000213 |    162227 |    162270 |  ID51 | 191 | - |
|        chrUn_gl000220 |    119794 |    119866 |  ID52 | 354 | - |
|        chrUn_gl000218 |      7951 |      7989 |  ID53 | 756 | - |
| chr19_gl000209_random |    147343 |    147414 |  ID54 | 792 | - |
|        chrUn_gl000221 |    111365 |    111378 |  ID55 | 494 | + |
|        chrUn_gl000214 |    129148 |    129192 |  ID56 | 382 | - |
|        chrUn_gl000228 |    103840 |    103865 |  ID57 | 857 | - |
|        chrUn_gl000227 |      5598 |      5663 |  ID58 | 114 | - |
|  chr1_gl000191_random |     82631 |     82632 |  ID59 | 737 | - |
| chr19_gl000208_random |       801 |       812 |  ID60 | 532 | + |
|  chr9_gl000198_random |      1986 |      2000 |  ID61 | 635 | - |
| chr17_gl000204_random |     70716 |     70725 |  ID62 | 828 | - |
|        chrUn_gl000233 |     40796 |     40879 |  ID63 | 459 | - |
|        chrUn_gl000237 |     32975 |     32989 |  ID64 | 554 | + |
|        chrUn_gl000230 |      1828 |      1847 |  ID65 | 811 | + |
|        chrUn_gl000242 |       969 |      1033 |  ID66 | 130 | - |
|        chrUn_gl000243 |     11318 |     11330 |  ID67 | 807 | - |
|        chrUn_gl000241 |     40739 |     40820 |  ID68 | 148 | + |
|        chrUn_gl000236 |      3592 |      3643 |  ID69 | 303 | - |
|        chrUn_gl000240 |     40872 |     40967 |  ID70 | 840 | + |
| chr17_gl000206_random |      6964 |      7048 |  ID71 | 673 | - |
|        chrUn_gl000232 |     10439 |     10511 |  ID72 | 833 | + |
|        chrUn_gl000234 |     20523 |     20576 |  ID73 | 149 | + |
| chr11_gl000202_random |     10214 |     10300 |  ID74 | 514 | + |
|        chrUn_gl000238 |     18495 |     18527 |  ID75 | 197 | + |
|        chrUn_gl000244 |     22151 |     22220 |  ID76 | 798 | - |
|        chrUn_gl000248 |      6317 |      6323 |  ID77 | 841 | + |
|  chr8_gl000196_random |     21150 |     21218 |  ID78 | 785 | - |
|        chrUn_gl000249 |       440 |       484 |  ID79 | 152 | + |
|        chrUn_gl000246 |     30097 |     30158 |  ID80 | 656 | + |
| chr17_gl000203_random |     13202 |     13239 |  ID81 | 800 | - |
|  chr8_gl000197_random |       630 |       674 |  ID82 | 126 | + |
|        chrUn_gl000245 |      8695 |      8717 |  ID83 | 387 | + |
|        chrUn_gl000247 |      8035 |      8063 |  ID84 | 716 | - |
|  chr9_gl000201_random |     21819 |     21897 |  ID85 |  60 | - |
|        chrUn_gl000235 |     23868 |     23923 |  ID86 | 652 | - |
|        chrUn_gl000239 |      4778 |      4863 |  ID87 | 823 | - |
| chr21_gl000210_random |      9750 |      9757 |  ID88 | 313 | + |
|        chrUn_gl000231 |     21112 |     21196 |  ID89 | 880 | - |
|        chrUn_gl000229 |     14957 |     14971 |  ID90 | 424 | - |
|                  chrM |      4308 |      4327 |  ID91 | 154 | + |
|        chrUn_gl000226 |      8416 |      8437 |  ID92 | 344 | + |
| chr18_gl000207_random |      1887 |      1985 |  ID93 | 518 | - |
|                  chr1 | 189063477 | 189063558 |  ID94 | 752 | + |
|                  chr2 | 111058172 | 111058198 |  ID95 | 911 | - |
|                  chr3 |  38005235 |  38005296 |  ID96 | 451 | + |
|                  chr4 | 160737381 | 160737470 |  ID97 | 918 | - |
|                  chr5 | 158371323 | 158371383 |  ID98 | 370 | + |
|                  chr6 |  25264328 |  25264386 |  ID99 | 477 | - |
|                  chr7 | 117089472 | 117089550 | ID100 | 686 | + |
+-----------------------+-----------+-----------+-------+-----+---+
ADD COMMENT
2
Entering edit mode

Nice solution, Pierre. Just curious, does mysql have a generator function (like Oracle and others) such that you could create an arbitrary number of random intervals (and thus not have to join to a table like kgXref with a limited number of rows)?

ADD REPLY
2
Entering edit mode

To generate a correct half-open BED file (i.e., to avoid the random possibility that the start and stop coordinates are equal), you might add a 1 to the stop coordinate, e.g.: @start+ROUND(RAND()*100)+1. Otherwise, this will occasionally generate incorrect BED elements.

ADD REPLY
1
Entering edit mode

@Aaron Quick answer: I don't know.

ADD REPLY
0
Entering edit mode

Thx Pierre, brilliant solution like usual

ADD REPLY

Login before adding your answer.

Traffic: 1764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6