Finding a variant's flanking sequence
1
0
Entering edit mode
3.9 years ago
L. A. Liggett ▴ 120

I am looking to improve the speed with which I find genomic sequence that flanks a given variant. Currently for a variant of interest, I use python to pull its chromosome and location, then query the UCSC genome browser for a given number of downstream and upstream bases like this:

check_output('wget -qO- http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=%s:%s,%s' % (chrom,low,high), stderr=STDOUT, shell=True)

This works just fine, especially when only working with a handful of variants. But when working with thousands of variants, this solution becomes quite slow because of the constant queries. So, is there a faster way of accomplishing the same task? I assume there is a more elegant way to use an offline copy of hg19 or something that would eliminate the need to constantly probe the genome browser?

sequence sequencing • 1.0k views
ADD COMMENT
2
Entering edit mode
3.9 years ago

The command you're looking for is samtools faidx

Example: samtools faidx hg19.fa 1:6403804-6403874

ADD COMMENT
0
Entering edit mode

This seems to be what others use, but it doesn't seem to be working for me. I do have hg19.fa.fai within the same directory as hg10.fa, but running samtools faidx hg19.fa 1:6403804-6403874 just outputs >1:6403804-6403874. Am I missing something?

ADD REPLY
2
Entering edit mode

Check if you chromosomes' name start with 'chr' :

head hg19.fa
ADD REPLY
0
Entering edit mode

Beautiful, samtools faidx hg19.fa chr1:6403804-6403874 works perfectly.

ADD REPLY

Login before adding your answer.

Traffic: 1754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6