Question: Extracting more features from EMBL files with Biopython
1
gravatar for Lina F
4.2 years ago by
Lina F160
Boston, MA
Lina F160 wrote:

Hi all,

I downloaded .embl files from The SEED and am trying to extract features from them using biopython.

For example, from the following excerpt of an embl file, I'm trying to get the line that contains the /product string:

ID   unknown; SV 1; linear; unassigned DNA; STD; UNC; 9430 BP.
XX
AC   unknown;
XX
DE   Contig AMTS01000351 from Escherichia coli FDA506
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..9430
FT                   /mol_type="genomic DNA"
FT                   /db_xref="taxon: 1005474"
FT                   /genome_md5="b6a2d1d1a41be1cf3128536aecba12be"
FT                   /project="mshukla_1005474"
FT                   /genome_id="1005474.3"
FT                   /organism="Escherichia coli FDA506"
FT   CDS             154..432
FT                   /db_xref="SEED:fig|1005474.3.peg.3831"
FT                   /translation="MKTKIVKGKTTKQDVLASFGEPDSRSLIDGEEQWSYTMYNSQSKA
FT                   TSFIPVVGLLAGGADSQTKSLTVSFKGEKVSTYIFNAGTSNVKTGIF"
FT                   /product="hypothetical lipoprotein"
...

 

I've been using SeqIO.parse to get sequence records and looking at record.features, but that's not giving me the /product string:

for record in SeqIO.parse(open(sys.argv[1]),"embl"):
    print record.id, record.features

The output is something like this:

unknown.1 [SeqFeature(FeatureLocation(ExactPosition(0), ExactPosition(9430), strand=1), type='source'), SeqFeature(FeatureLocation(ExactPosition(153), ExactPosition(432), strand=1), type='CDS'), SeqFeature(FeatureLocation(ExactPosition(507), ExactPosition(1710), strand=-1), type='CDS'), 
...

 

I think there is a way to do it in Bioperl, but what's the equivalent for Biopython?

Thanks for any advice you might have!

biopython embl parse • 2.1k views
ADD COMMENTlink modified 3.8 years ago by mgalactus730 • written 4.2 years ago by Lina F160
1
gravatar for mgalactus
3.8 years ago by
mgalactus730
United Kingdom
mgalactus730 wrote:

Hi,

the 'product' annotation can be found inside each SeqFeature objects in the 'qualifiers' dictionary

from Bio import SeqIO

s = SeqIO.read('input.embl', 'embl')

for feature in s.features:
    print(feature.qualifiers.get('product', []))

You can also filter the SeqFeature objects by type (feature.type == 'CDS')

Hope this helps...

ADD COMMENTlink modified 3.4 years ago • written 3.8 years ago by mgalactus730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 887 users visited in the last hour