Question: Biopython Embl Parser Only Reads One Entry
0
gravatar for sinanugur
5.9 years ago by
sinanugur10
sinanugur10 wrote:

Hello,

I am trying to parse genome records in EMBL format. Everything seems OK and without exception but parser only reads first record. Here is my code to parse EMBL file,

from Bio import SeqIO

for record in SeqIO.parse("AE000657.1.embl","embl"):
        print record.id </tab>



This script only returns: AE000657.1.
That is all, the other genomic regions are not printed. The link of the file is this one; http://www.ebi.ac.uk/ena/data/view/AE000657&display=text

EMBL file is OK and in fact it can be opened by Artemis. Thus, it is not a corrupted file. So what is the problem here? Thanks

python biopython • 2.2k views
ADD COMMENTlink modified 5.9 years ago by Peter5.8k • written 5.9 years ago by sinanugur10
1

Could you edit your question to include a URL to the test file? Without that this isn't going to be easy to assist you with.

ADD REPLYlink written 5.9 years ago by Peter5.8k

Yep I edited my question.

ADD REPLYlink written 5.9 years ago by sinanugur10
1

Not sure what the issue is. Your file contains one sequence record and the code prints its ID, as expected. Maybe you want FT lines as suggested in Peter's answer?

ADD REPLYlink written 5.9 years ago by Neilfws48k
7
gravatar for Peter
5.9 years ago by
Peter5.8k
Scotland, UK
Peter5.8k wrote:

In EMBL, each record starts with an "ID" line and ends with a "//" line, and your EMBL file as shown at http://www.ebi.ac.uk/ena/data/view/AE000657&display=text does really only contain one record. The Biopython parser is therefore working as designed.

I would guess what you are looking for is the features, i.e. the information on the FT lines (Feature Table). These get parsed into SeqFeature objects in Biopython, held as a list as the features property of the SeqRecord object. Note for for single sequence files, you may find it simpler to use the read function:

from Bio import SeqIO
record = SeqIO.read("AE000657.1.embl","embl")
print "Record %s has %i features" % record.id, len(record.features))
ADD COMMENTlink written 5.9 years ago by Peter5.8k

Thanks, I wanted to parse features. I thought I can iterate through those features via SeqIO.parse but now I get that. Cheers.

ADD REPLYlink written 5.9 years ago by sinanugur10

Great.

P.S. On BioStars (like StackExchange) you are expected to mark an answer as accepted if it solves your problem - this is used for the user profile ratings etc.

ADD REPLYlink written 5.9 years ago by Peter5.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 898 users visited in the last hour