Using cruzdb to retrieve SNP sequence including flanking
1
0
Entering edit mode
6.9 years ago

I want to use cruzdb to query a list of SNPs by rs id in a text file and retrieve sequence including 200 basepairs flanking each SNP. I can do this in the UCSC genome browser table by selecting "Output format" = sequence. I have some code below that I sketched together from previous posts.

from cruzdb import Genome
import sys
file_in = sys.argv[1]
file_handle = open("rs_example2.txt", 'rb')
hg19 = Genome(db = 'hg19')
snp147 = hg19.snp147
for rs in file_handle:
    rs.split()[0].strip('\n')
    if rs.startswith("rs"):
        print snp147.filter_by(name=rs).first()

Unfortunately, there is no sequence information here. I also ran across the snp sequence database but not sure how to use it. hg19.snp147Seq.filter_by(name='rs9923231')

SNP cruzdb UCSC Genome Browser • 1.5k views
ADD COMMENT
0
Entering edit mode
6.9 years ago

In case anyone is interested I decided not to use dbcruz module and just to query dbSNP instead. Much easier.

https://www.ncbi.nlm.nih.gov/projects/SNP/SNPeutils.htm

ADD COMMENT

Login before adding your answer.

Traffic: 1949 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6