Question

Using cruzdb to retrieve SNP sequence including flanking

0

Entering edit mode

6.9 years ago

yarrowmadrona • 0

I want to use cruzdb to query a list of SNPs by rs id in a text file and retrieve sequence including 200 basepairs flanking each SNP. I can do this in the UCSC genome browser table by selecting "Output format" = sequence. I have some code below that I sketched together from previous posts.

from cruzdb import Genome
import sys
file_in = sys.argv[1]
file_handle = open("rs_example2.txt", 'rb')
hg19 = Genome(db = 'hg19')
snp147 = hg19.snp147
for rs in file_handle:
    rs.split()[0].strip('\n')
    if rs.startswith("rs"):
        print snp147.filter_by(name=rs).first()

Unfortunately, there is no sequence information here. I also ran across the snp sequence database but not sure how to use it. hg19.snp147Seq.filter_by(name='rs9923231')

SNP cruzdb UCSC Genome Browser • 1.5k views

ADD COMMENT • link 6.9 years ago by yarrowmadrona • 0

score 0 · Answer 1 · 2017-06-02

0

Entering edit mode

6.9 years ago

yarrowmadrona • 0

In case anyone is interested I decided not to use dbcruz module and just to query dbSNP instead. Much easier.

https://www.ncbi.nlm.nih.gov/projects/SNP/SNPeutils.htm

ADD COMMENT • link 6.9 years ago by yarrowmadrona • 0