Question: Biopython Translation Error?
1
gravatar for Mark Evans
3.2 years ago by
Mark Evans50
Mark Evans50 wrote:

Hello, I am writing some code intended to translate ambiguous DNA codes into possible amino acids and I am seeing some strange translation from the Biopython 1.56 package. It appears to be translating ambiguous DNA codes to 'J' which does not exist as a code for anything. I am running python 2.6.1 on Mac OS 10.6.6.

For example:

>>>from Bio.Seq import *
>>>translate('ARAWTAGKAMTA')
'XJXJ'

or

>>>from Bio.Seq import Seq
>>>c = Seq('ARAWTAGKAMTA')
>>>c.translate().tostring()
'XJXJ'

I have looked through the Bio.Data.CodonTable source and Bio.Seq source and I cannot find a reason why this would be happening. Any ideas?

Thanks!

Mark

ADD COMMENTlink written 3.2 years ago by Mark Evans50
5
gravatar for Pierre Lindenbaum
3.2 years ago by
France
Pierre Lindenbaum58k wrote:

Biopython seems to use a extended alphabet for the amino acids: see http://www.biopython.org/DIST/docs/api/Bio.Alphabet.IUPAC.ExtendedIUPACProtein-class.html

B = "Asx";  Aspartic acid (R) or Asparagine (N)
X = "Xxx";  Unknown or 'other' amino acid
Z = "Glx";  Glutamic acid (E) or Glutamine (Q)
J = "Xle";  Leucine (L) or Isoleucine (I), used in mass-spec (NMR)
U = "Sec";  Selenocysteine
O = "Pyl";  Pyrrolysine
ADD COMMENTlink written 3.2 years ago by Pierre Lindenbaum58k

Thanks Pierre, please see my followup -Mark

ADD REPLYlink written 3.2 years ago by Mark Evans50
5
gravatar for User 2510
3.2 years ago by
User 251050
User 251050 wrote:

I can explain the error in your second bit of code -- IUPACAmbiguousDNA is a class and needs to be instantiated, so

c = Seq('ARAWTAGKAMTA',IUPACAmbiguousDNA)

should be

c = Seq('ARAWTAGKAMTA',IUPACAmbiguousDNA() )

Meanwhile, Bio/Data/IUPACData.py maps 'W' to 'A','T', which means that 'WTA' -> 'ATA','TTA' -> 'I','L' which is 'J'.

I haven't found a way to force Seq.translate() to use IUPACProtein instead of ExtendedIUPACProtein, which might be what you want if you'd rather see 'X' than 'J'. An ugly fix would be to just use string replace:

   Seq('ARAWTAGKAMTA',IUPACAmbiguousDNA()).translate().tostring().replace('J','X')

Ugly.

ADD COMMENTlink written 3.2 years ago by User 251050
1
gravatar for Mark Evans
3.2 years ago by
Mark Evans50
Mark Evans50 wrote:

Thanks Pierre. That helps some. There is still something I must be missing though. You are right that ExtendedIUPACProtein uses 'J'. So in that case, based on my example, 'WTA' would be the corresponding codon. I still don't see where that gets mapped to 'J'.

ExtendedIUPACDNA calls 'W' as wyosine, (which I don't even know what that is...googling) http://biopython.org/DIST/docs/api/Bio.Alphabet.IUPAC.ExtendedIUPACDNA-class.html

B = 5-bromouridine
D = 5,6-dihydrouridine
S = thiouridine
W = wyosine

but "normal" DNA ambiguity codes are here in IUPACAmbiguousDNA http://biopython.org/DIST/docs/api/Bio.Alphabet.IUPAC.IUPACAmbiguousDNA-class.html

letters = 'GATCRYWSMKHBVDN'

'W' traditionally codes for 'T' or 'A'

if I get more specific in my example and specify an alphabet

>>>from Bio.Seq import Seq
>>>from Bio.Alphabet.IUPAC import *
>>>c = Seq('ARAWTAGKAMTA',IUPACAmbiguousDNA)
>>>c.translate().tostring()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mark/Downloads/biopython-1.54/build/lib.macosx-10.6-universal-2.6/Bio/Seq.py", line 930, in translate
  File "/Users/mark/Downloads/biopython-1.54/build/lib.macosx-10.6-universal-2.6/Bio/Alphabet/__init__.py", line 213, in _get_base_alphabet
AssertionError: Invalid alphabet found, <class Bio.Alphabet.IUPAC.IUPACAmbiguousDNA at 0x10057c230>

Bad things happen. So I am still not quite understanding the ins and outs of this. The translate call goes to CodonTable where 1) I still don't see a 'J' and 2) I don't understand this new error.

Thanks!

Mark

ADD COMMENTlink written 3.2 years ago by Mark Evans50

The place in the code where that happens is https://github.com/biopython/biopython/blob/master/Bio/Data/CodonTable.py All of the ambiguous codes are expanded and shoved into the forward translation table which is referenced indirectly from the Bio.Seq.translate method.

ADD REPLYlink written 3.2 years ago by Paul J. Davis0

IUPACAmbiguousDNA is the class, IUPACAmbiguousDNA() is an instance of the class.

ADD REPLYlink written 3.2 years ago by Peter3.8k

IUPACAmbiguousDNA is the class, IUPACAmbiguousDNA() is an instance of the class. See profileshervold's answer.

ADD REPLYlink written 3.2 years ago by Peter3.8k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 619 users visited in the last hour