Question: How To Add Partial Cds Seqfeature In Biopython?
0
gravatar for Leszek
6.3 years ago by
Leszek4.0k
IIMCB, Poland
Leszek4.0k wrote:

I'm trying to create partial CDS SeqFeature in BioPython:

from Bio.SeqFeature import SeqFeature, FeatureLocation, AfterPosition, BeforePosition

sf = SeqFeature(FeatureLocation(start,end), strand=1, type='CDS', id='genename')
seq = sf.extract(r.seq)
aa = str(seq.translate(table=gcode))
#mark as partial
if aa[0]!="M":
    sf.location.start.position = BeforePosition(start)
if aa[-1]!="*":
    sf.location.end.position = AfterPosition(end)
r.features.append(sf)

but when I write embl record I got an error:

output.write(r.format('embl'))
File "/usr/lib/pymodules/python2.7/Bio/SeqFeature.py", line 663, in __eq__
    "We can only do comparisons between Biopython Position objects."
AssertionError: We can only do comparisons between Biopython Position objects.

If I first create partial SeqFeature, then I'm unable to translate it:

sf = SeqFeature(FeatureLocation(start,AfterPosition(end)), strand=1, type='CDS', id='genename')
seq = sf.extract(r.seq)
aa = str(seq.translate(table=gcode))

File "/usr/lib/pymodules/python2.7/Bio/SeqFeature.py", line 627, in <lambda>
    fget=lambda self: self._end.position + self._end.extension,
TypeError: unsupported operand type(s) for +: 'AfterPosition' and 'int'

Anyone has an idea how to do it correctly in BioPython?

cds biopython python • 2.0k views
ADD COMMENTlink modified 2.1 years ago by Biostar ♦♦ 20 • written 6.3 years ago by Leszek4.0k

The test for the first amino acid being M is wrong, you should be checking the first three bases are a start codon. Non-standard start codons become M when used as start codon, but would usually translate as something else (e.g. Valine).

ADD REPLYlink written 6.3 years ago by Peter5.8k
0
gravatar for Leszek
6.3 years ago by
Leszek4.0k
IIMCB, Poland
Leszek4.0k wrote:

Ok, I think I have solved it. For some reason, you have to translate the entry first, as partial entries are impossible to translate, and then set it as partial entry:

from Bio.SeqFeature import SeqFeature, FeatureLocation, AfterPosition, BeforePosition

sf = SeqFeature(FeatureLocation(start,end), strand=1, type='CDS', id='genename')
seq = sf.extract(r.seq)
aa = str(seq.translate(table=gcode))
#mark as partial
#both ends partial
if aa[0]!="M" and aa[-1]!="*":
    sf.location = FeatureLocation(BeforePosition(start-1),AfterPosition(end))
#left end partial
elif aa[0]!="M" or aa[-1]!="*" and strand==-1:
    sf.location = FeatureLocation(BeforePosition(start-1),end)
#right end partial
elif aa[-1]!="*" or aa[0]!="M" and strand==-1:
    sf.location = FeatureLocation(start-1,AfterPosition(end))

r.features.append(sf)
output.write(r.format('embl'))

Hope, someone else may benefit from that:)

ADD COMMENTlink written 6.3 years ago by Leszek4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1640 users visited in the last hour