Question: Extracting more features from EMBL files with Biopython
1
gravatar for Lina F
3.7 years ago by
Lina F150
Boston, MA
Lina F150 wrote:

Hi all,

I downloaded .embl files from The SEED and am trying to extract features from them using biopython.

For example, from the following excerpt of an embl file, I'm trying to get the line that contains the /product string:

ID   unknown; SV 1; linear; unassigned DNA; STD; UNC; 9430 BP.
XX
AC   unknown;
XX
DE   Contig AMTS01000351 from Escherichia coli FDA506
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..9430
FT                   /mol_type="genomic DNA"
FT                   /db_xref="taxon: 1005474"
FT                   /genome_md5="b6a2d1d1a41be1cf3128536aecba12be"
FT                   /project="mshukla_1005474"
FT                   /genome_id="1005474.3"
FT                   /organism="Escherichia coli FDA506"
FT   CDS             154..432
FT                   /db_xref="SEED:fig|1005474.3.peg.3831"
FT                   /translation="MKTKIVKGKTTKQDVLASFGEPDSRSLIDGEEQWSYTMYNSQSKA
FT                   TSFIPVVGLLAGGADSQTKSLTVSFKGEKVSTYIFNAGTSNVKTGIF"
FT                   /product="hypothetical lipoprotein"
...

 

I've been using SeqIO.parse to get sequence records and looking at record.features, but that's not giving me the /product string:

for record in SeqIO.parse(open(sys.argv[1]),"embl"):
    print record.id, record.features

The output is something like this:

unknown.1 [SeqFeature(FeatureLocation(ExactPosition(0), ExactPosition(9430), strand=1), type='source'), SeqFeature(FeatureLocation(ExactPosition(153), ExactPosition(432), strand=1), type='CDS'), SeqFeature(FeatureLocation(ExactPosition(507), ExactPosition(1710), strand=-1), type='CDS'), 
...

 

I think there is a way to do it in Bioperl, but what's the equivalent for Biopython?

Thanks for any advice you might have!

biopython embl parse • 1.9k views
ADD COMMENTlink modified 3.4 years ago by mgalactus720 • written 3.7 years ago by Lina F150
1
gravatar for mgalactus
3.4 years ago by
mgalactus720
United Kingdom
mgalactus720 wrote:

Hi,

the 'product' annotation can be found inside each SeqFeature objects in the 'qualifiers' dictionary

from Bio import SeqIO

s = SeqIO.read('input.embl', 'embl')

for feature in s.features:
    print(feature.qualifiers.get('product', []))

You can also filter the SeqFeature objects by type (feature.type == 'CDS')

Hope this helps...

ADD COMMENTlink modified 2.9 years ago • written 3.4 years ago by mgalactus720
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1236 users visited in the last hour