4.0 years ago by
Salt Lake City, UT
If you're willing to try an in-development library, you can try cruzdb. With a script like this:
from cruzdb import Genome
db = Genome('hg19')
refGene = db.refGene
for name in (n.strip() for n in open("names.txt")):
gene = refGene.filter_by(name=name).one()
print ">%s" % name
names.txt containing id's like: NM_001127388 NM_001127389
It will create print FASTA file by querying the UCSC genomes database (refGene table), and grabbing sequence from their DAS sequence server.
If you have a long list, see the notes on the cruzdb page about mirroring the MySQL pages locally.