How To Retrive Selected Region From Each Protein In A Multiple Fasta File
1
0
Entering edit mode
11.9 years ago
friveroll ▴ 60

With this list from blast results, first column for name, second start position and third end position.

gi|6226847|sp|Q13324.2|CRFR2_HUMAN 366 411
gi|1345735|sp|P47866.1|CRFR2_RAT 366 411
gi|2495061|sp|Q60748.1|CRFR2_MOUSE 386 431
gi|3023564|sp|O42603.1|CRFR2_XENLA 368 413
gi|3023563|sp|O42602.1|CRFR1_XENLA 370 415
gi|2495062|sp|Q90812.1|CRFR1_CHICK 375 420
gi|3913367|sp|O62772.1|CRFR1_SHEEP 370 415
gi|75053365|sp|Q76LL8.1|CRFR1_MACMU 370 415
gi|544100|sp|P35347.1|CRFR1_MOUSE 370 415
gi|461836|sp|P34998.1|CRFR1_HUMAN 399 444
gi|544101|sp|P35353.1|CRFR1_RAT 370 415

Edit: Finally I've got it here is the code inspired in this post

from Bio import Entrez
from Bio import SeqIO

Entrez.email = 'friveroll@gmail.com'
filename = 'out_region.fasta'

with open('list2.txt', 'r') as fd:
    for line in fd:
      gi=line.split('|')[1]
      start=line.split(' ')[1]
      end =line.split(' ')[2].rstrip("\n")
      handle = Entrez.efetch(db='protein',
      rettype='fasta', 
      retmode='text', 
      seq_start=int(start), 
      seq_stop=int(end), 
      id = gi)

      rec = SeqIO.read(handle, 'fasta')

      out_handle = open(filename,'a')
      SeqIO.write(rec, out_handle, 'fasta')
      out_handle.close()
      handle.close()
biopython blast fasta • 2.5k views
ADD COMMENT
0
Entering edit mode

I modify the code, but I can only write the first record to a fasta file

ADD REPLY
1
Entering edit mode
11.9 years ago
raunakms ★ 1.1k

Here is my answer to a similar post but this one uses BioPerl

http://www.biostars.org/post/show/13659/how-can-i-programmatically-retrieve-the-genbank-records-with-accession-numbers-in-the-form-jn/#18429

There must be a similar GenBank parser in BioPython just dig into its website !!

ADD COMMENT

Login before adding your answer.

Traffic: 2536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6