How To Add Partial Cds Seqfeature In Biopython?
1
0
Entering edit mode
10.8 years ago
Leszek 4.2k

I'm trying to create partial CDS SeqFeature in BioPython:

from Bio.SeqFeature import SeqFeature, FeatureLocation, AfterPosition, BeforePosition

sf = SeqFeature(FeatureLocation(start,end), strand=1, type='CDS', id='genename')
seq = sf.extract(r.seq)
aa = str(seq.translate(table=gcode))
#mark as partial
if aa[0]!="M":
    sf.location.start.position = BeforePosition(start)
if aa[-1]!="*":
    sf.location.end.position = AfterPosition(end)
r.features.append(sf)

but when I write embl record I got an error:

output.write(r.format('embl'))
File "/usr/lib/pymodules/python2.7/Bio/SeqFeature.py", line 663, in __eq__
    "We can only do comparisons between Biopython Position objects."
AssertionError: We can only do comparisons between Biopython Position objects.

If I first create partial SeqFeature, then I'm unable to translate it:

sf = SeqFeature(FeatureLocation(start,AfterPosition(end)), strand=1, type='CDS', id='genename')
seq = sf.extract(r.seq)
aa = str(seq.translate(table=gcode))

File "/usr/lib/pymodules/python2.7/Bio/SeqFeature.py", line 627, in <lambda>
    fget=lambda self: self._end.position + self._end.extension,
TypeError: unsupported operand type(s) for +: 'AfterPosition' and 'int'

Anyone has an idea how to do it correctly in BioPython?

biopython cds python • 3.2k views
ADD COMMENT
0
Entering edit mode

The test for the first amino acid being M is wrong, you should be checking the first three bases are a start codon. Non-standard start codons become M when used as start codon, but would usually translate as something else (e.g. Valine).

ADD REPLY
0
Entering edit mode
10.8 years ago
Leszek 4.2k

Ok, I think I have solved it. For some reason, you have to translate the entry first, as partial entries are impossible to translate, and then set it as partial entry:

from Bio.SeqFeature import SeqFeature, FeatureLocation, AfterPosition, BeforePosition

sf = SeqFeature(FeatureLocation(start,end), strand=1, type='CDS', id='genename')
seq = sf.extract(r.seq)
aa = str(seq.translate(table=gcode))
#mark as partial
#both ends partial
if aa[0]!="M" and aa[-1]!="*":
    sf.location = FeatureLocation(BeforePosition(start-1),AfterPosition(end))
#left end partial
elif aa[0]!="M" or aa[-1]!="*" and strand==-1:
    sf.location = FeatureLocation(BeforePosition(start-1),end)
#right end partial
elif aa[-1]!="*" or aa[0]!="M" and strand==-1:
    sf.location = FeatureLocation(start-1,AfterPosition(end))

r.features.append(sf)
output.write(r.format('embl'))

Hope, someone else may benefit from that:)

ADD COMMENT

Login before adding your answer.

Traffic: 1512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6