I would like to be able to draw a random list of N SNPs from dbSNP/UCSC. If I have a list of HapMap SNPs, for instance, in a bed file format, I can shuffle them and select 1000 at random. Since the placement of SNPs in HapMap is not necessarily representative of the totality of SNPs in the genome, I'd like to do this with dbSNP. Short of downloading a bedfile of all SNPs in the genome from which resampling might be computationally intensive, is there an easy method by which to draw some number of random SNPs from the genome and have them returned in BED file format?
To do this with a list of SNPs in a bed file, I currently use the shuf command like shown below. But to do this for the 56M SNPs currently in dbSNP in order to resample 10k random SNPS multiple times might be too intensive. Ideas? R perhaps? Anyway to do this from the unix prompt so I can use the output in bedtools?
cat file.bed | head
chr1    235638  235751  13.6663
chr1    237748  237784  6.35761
chr1    521484  521614  10.0359
chr1    565575  566082  7.19007
chr1    567523  567873  10.5674
chr1    568176  568545  5.7313
chr1    569748  570042  652.342
chr1    664708  664756  6.32348
shuf file.bed | head
chr3    138552319       138553474       56.8719
chr12   7695465 7695792 11.469
chr20   23312538        23312926        6.68979
chr14   87802700        87802821        6.09238
chr2    180293340       180293591       4.35159
chr18   60279291        60279551        7.28719
chr19   49068267        49068726        34.7679
chr12   60729653        60729899        20.4301
chr2    30458084        30458522        65.6261
chr12   63695225        63695404        4.89757
                    
                
                
Maybe this could help you: https://code.google.com/p/bedtools/wiki/Usage#shuffleBed