I have a large list of chromosomal regions for which I would like to generate +/-10 bp of flanking sequence surrounding all SNPs for all alleles that are found in these regions. A hypothetical output for two alleles found in a given range on chromosome 1 might look like this:
>chr1|1234500|G
AAAAAAAAAAGAAAAAAAAAA
>chr1|1234500|C
AAAAAAAAAACAAAAAAAAAA
Normally, I generate an XML query and download the data from BioMart. Using BioMart my query always hangs (Biomart seems unusually slow lately). Does anyone know how I can generate the sequences for all alleles of all SNPs found within a given range using something other than BioMart? Thanks in advance for any help you can provide.
I should also mention, that another solution that would work perfectly fine would be a method to download all the DB132 SNPs with flanking sequences. From there I can whip up a script to filter only the sequences I need from my regions of interest.
Thanks Aaron. I tried Pierre's answer and it seems to work, but I definitely didn't get all the DB132 SNPs. Only about 20K. I'm not familiar with bedtools, so thanks for suggesting it. Do I need to provide the snps.bed file?
bedtools is great for this sort of job. You would need to get your snps into a format that bedtools can work with: vcf,bed,gff
Thanks Aarron. One additional question. I installed bedtools via Homebrew and can't seem to located the hg19.genome file. In the docs it says, "BEDTools includes predefined genome files for human and mouse in the /genomes directory included in the BEDTools distribution." Do you know what directory these files reside in? Thanks.
I haven't installed with Homebrew but you can get them from the source code, http://code.google.com/p/bedtools/downloads/detail?name=BEDTools.v2.14.3.tar.gz&can=2&q=
Thanks Aaron. I did eventually get it working with Homebrew. Just so your aware, there seems to be an issue with the location of the genome directory when installed via Homebrew. This is a fantastic tool! Thanks for making me aware of it's existence.