Hi everyone,
I have a file with about 77,000 3'-utr region and I used Entrez.efetch to get the sequence of each region. I find the speed is slow (about 0.5 sec to get 1 sequence).
My code is like:
from Bio import Entrez, SeqIO
from Bio.SeqRecord import SeqRecord
f=open("utr3hg19.txt","r")        # open the file contains all human 3'-utr coordinates
                                  # each line contains information of one 3'-utr
                                  # column 2,3,4,5 represent gi, strand, start, end
                                  # split by tab
data=f.readlines()
f.close()
i=1                               # skip the first line 
f=open("utr3.fasta","w")          # sequences will be written into this file
while i<len(data):
    temp=data[i].split("\t")
    Entrez.email = "A.N.Other@example.com"
    handle = Entrez.efetch(db="nucleotide",
                          id=temp[1],           
                          rettype="fasta",
                          strand=temp[2],
                          seq_start=int(temp[4]),
                          seq_stop=int(temp[3]))
    record = SeqIO.read(handle, "fasta")
    handle.close()
    r=SeqRecord(record.seq,data[i],"","")
    d=[]
    d.append(r)       
    SeqIO.write(d,f,"fasta")
    i+=1
f.close()
Is that due to my bad coding? Or it's a network problem...? BTW, I run the code on a 12-core linux server.
+1 and also see this post: Batch Fetching Fasta Sequences From Bed File
Thanks a lot, Peter. I used to search locally. Yesterday I suddenly wanna whether this can be done by biopython from NCBI, if I meet a species whose genome is not stored locally (I'm lazy to download genome.:( )