I want to use cruzdb to query a list of SNPs by rs id in a text file and retrieve sequence including 200 basepairs flanking each SNP. I can do this in the UCSC genome browser table by selecting "Output format" = sequence. I have some code below that I sketched together from previous posts.
from cruzdb import Genome
import sys
file_in = sys.argv[1]
file_handle = open("rs_example2.txt", 'rb')
hg19 = Genome(db = 'hg19')
snp147 = hg19.snp147
for rs in file_handle:
rs.split()[0].strip('\n')
if rs.startswith("rs"):
print snp147.filter_by(name=rs).first()
Unfortunately, there is no sequence information here. I also ran across the snp sequence database but not sure how to use it. hg19.snp147Seq.filter_by(name='rs9923231')