Biopython Translation Error?
3
1
Entering edit mode
13.6 years ago
Mark Evans ▴ 50

Hello, I am writing some code intended to translate ambiguous DNA codes into possible amino acids and I am seeing some strange translation from the Biopython 1.56 package. It appears to be translating ambiguous DNA codes to 'J' which does not exist as a code for anything. I am running python 2.6.1 on Mac OS 10.6.6.

For example:

>>>from Bio.Seq import *
>>>translate('ARAWTAGKAMTA')
'XJXJ'

or

>>>from Bio.Seq import Seq
>>>c = Seq('ARAWTAGKAMTA')
>>>c.translate().tostring()
'XJXJ'

I have looked through the Bio.Data.CodonTable source and Bio.Seq source and I cannot find a reason why this would be happening. Any ideas?

Thanks!

Mark

biopython python protein translation • 6.7k views
ADD COMMENT
5
Entering edit mode
13.6 years ago

Biopython seems to use a extended alphabet for the amino acids: see here

B = "Asx";  Aspartic acid (R) or Asparagine (N)
X = "Xxx";  Unknown or 'other' amino acid
Z = "Glx";  Glutamic acid (E) or Glutamine (Q)
J = "Xle";  Leucine (L) or Isoleucine (I), used in mass-spec (NMR)
U = "Sec";  Selenocysteine
O = "Pyl";  Pyrrolysine
ADD COMMENT
0
Entering edit mode

Thanks Pierre, please see my followup -Mark

ADD REPLY
5
Entering edit mode
13.6 years ago
User 2510 ▴ 50

I can explain the error in your second bit of code -- IUPACAmbiguousDNA is a class and needs to be instantiated, so

c = Seq('ARAWTAGKAMTA',IUPACAmbiguousDNA)

should be

c = Seq('ARAWTAGKAMTA',IUPACAmbiguousDNA() )

Meanwhile, Bio/Data/IUPACData.py maps W to A,T, which means that WTA -> ATA,TT' -> I,L which is J.

I haven't found a way to force Seq.translate() to use IUPACProtein instead of ExtendedIUPACProtein, which might be what you want if you'd rather see X than J. An ugly fix would be to just use string replace:

Seq('ARAWTAGKAMTA',IUPACAmbiguousDNA()).translate().tostring().replace('J','X')

Ugly.

ADD COMMENT
1
Entering edit mode
13.6 years ago
Mark Evans ▴ 50

Thanks Pierre. That helps some. There is still something I must be missing though. You are right that ExtendedIUPACProtein uses J. So in that case, based on my example, WTA would be the corresponding codon. I still don't see where that gets mapped to J.

ExtendedIUPACDNA calls W as wyosine, (which I don't even know what that is...googling) http://biopython.org/DIST/docs/api/Bio.Alphabet.IUPAC.ExtendedIUPACDNA-class.html

B = 5-bromouridine
D = 5,6-dihydrouridine
S = thiouridine
W = wyosine

but "normal" DNA ambiguity codes are here in IUPACAmbiguousDNA.

letters = 'GATCRYWSMKHBVDN'

W traditionally codes for T or A

if I get more specific in my example and specify an alphabet

>>>from Bio.Seq import Seq
>>>from Bio.Alphabet.IUPAC import *
>>>c = Seq('ARAWTAGKAMTA',IUPACAmbiguousDNA)
>>>c.translate().tostring()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mark/Downloads/biopython-1.54/build/lib.macosx-10.6-universal-2.6/Bio/Seq.py", line 930, in translate
  File "/Users/mark/Downloads/biopython-1.54/build/lib.macosx-10.6-universal-2.6/Bio/Alphabet/__init__.py", line 213, in _get_base_alphabet
AssertionError: Invalid alphabet found, <class Bio.Alphabet.IUPAC.IUPACAmbiguousDNA at 0x10057c230>

Bad things happen. So I am still not quite understanding the ins and outs of this. The translate call goes to CodonTable where 1) I still don't see a J'and 2) I don't understand this new error.

Thanks!
Mark

ADD COMMENT
0
Entering edit mode

The place in the code where that happens is https://github.com/biopython/biopython/blob/master/Bio/Data/CodonTable.py All of the ambiguous codes are expanded and shoved into the forward translation table which is referenced indirectly from the Bio.Seq.translate method.

ADD REPLY
0
Entering edit mode

IUPACAmbiguousDNA is the class, IUPACAmbiguousDNA() is an instance of the class. See profileshervold's answer.

ADD REPLY

Login before adding your answer.

Traffic: 1468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6