I want to use cruzdb to query a list of SNPs by rs id in a text file and retrieve sequence including 200 basepairs flanking each SNP. I can do this in the UCSC genome browser table by selecting "Output format" = sequence. I have some code below that I sketched together from previous posts.
from cruzdb import Genome import sys file_in = sys.argv file_handle = open("rs_example2.txt", 'rb') hg19 = Genome(db = 'hg19') snp147 = hg19.snp147 for rs in file_handle: rs.split().strip('\n') if rs.startswith("rs"): print snp147.filter_by(name=rs).first()
Unfortunately, there is no sequence information here. I also ran across the snp sequence database but not sure how to use it. hg19.snp147Seq.filter_by(name='rs9923231')