I am looking to improve the speed with which I find genomic sequence that flanks a given variant. Currently for a variant of interest, I use python to pull its chromosome and location, then query the UCSC genome browser for a given number of downstream and upstream bases like this:
check_output('wget -qO- http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=%s:%s,%s' % (chrom,low,high), stderr=STDOUT, shell=True)
This works just fine, especially when only working with a handful of variants. But when working with thousands of variants, this solution becomes quite slow because of the constant queries. So, is there a faster way of accomplishing the same task? I assume there is a more elegant way to use an offline copy of hg19 or something that would eliminate the need to constantly probe the genome browser?