Unextractible Embl Features For Seqio.Parse
1
0
Entering edit mode
10.1 years ago
blaise.li ▴ 10

I would like to extract information from some of the "SO_feature" features of the following file: https://raw.github.com/cbergman/transposons/master/current/transposon_sequence_set.embl.txt

Is it normal that when I parse the file using BioPython no features get associated with the records. I obtain the records as follows:

record = next(SeqIO.parse("transposon_sequence_set.embl.txt", "embl"))

More generally, what makes a feature extractible or not by BioPython?

biopython feature • 2.4k views
ADD COMMENT
0
Entering edit mode

Have you tried to split the file in individual records and trying with each one?

ADD REPLY
0
Entering edit mode

The features list is still empty when applying SeqIO.read() to a file containing the first 188 lines of the original file (that should be the first record only). Other attributes like annotations or dbxrefs seem normal. It's my first time trying to read an embl formatted file, so I though I just had made some basic usage error of Biopython. But maybe the records are not well formatted, or maybe there are limitations in BioPython's embl parser.

ADD REPLY
3
Entering edit mode
10.1 years ago
Peter 6.0k

Where did this file come from? It does not look like a real EMBL file - for a start it is missing the feature table header (which is indirectly why the parser seems to have ignored your features):

FH   Key             Location/Qualifiers
FH

If I add that manually, then Biopython complains:

BiopythonParserWarning: Overindented SO_feature feature?
BiopythonParserWarning: Couldn't parse feature location: 'five_prime_LTR;SO:0000425:1..600'
BiopythonParserWarning: Couldn't parse feature location: 'three_prime_LTR;SO:0000426:6841..7411'
BiopythonParserWarning: Couldn't parse feature location: 'CDS;SO:0000316:<988..2031'
BiopythonParserWarning: Couldn't parse feature location: 'CDS;SO:0000316:<1950..5402'
BiopythonParserWarning: Couldn't parse feature location: 'CDS;SO:0000316:5248..6780'

All the feature locations are very wrong - the SO_feature bit seems to have been inserted and the real feature type (e.g. CDS) pushed to the right.

ADD COMMENT

Login before adding your answer.

Traffic: 1700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6