Question

Why are some codons containing uncertainties wrongly translated in BioPython?

0

Entering edit mode

5.9 years ago

Palming ▴ 10

I recently noticed that some of my DNA sequences were wrongly translated in BioPython, and more precisely, it affects some codons containing uncertainties but not all of them.

To reproduce my problem, you can use the following code:

from Bio.Seq import Seq

wrong_translations = []
good_translations = []

uncertainties = ['AAN', 'ATN', 'ACN', 'AGN', 'ANA', 'ANT', 'ANC', 'ANG', 'ANN', 'TAN', 'TTN', 'TCN', 'TGN', 'TNA', 'TNT', 'TNC', 'TNG', 'TNN', 'CAN', 'CTN', 'CCN', 'CGN', 'CNA', 'CNT', 'CNC', 'CNG', 'CNN', 'GAN', 'GTN', 'GCN', 'GGN', 'GNA', 'GNT', 'GNC', 'GNG', 'GNN', 'NAA', 'NAT', 'NAC', 'NAG', 'NAN', 'NTA', 'NTT', 'NTC', 'NTG', 'NTN', 'NCA', 'NCT', 'NCC', 'NCG', 'NCN', 'NGA', 'NGT', 'NGC', 'NGG', 'NGN', 'NNA', 'NNT', 'NNC', 'NNG', 'NNN']

for i in uncertainties:
    translation = Seq(i).translate()
    if translation == 'X':
        good_translations.append(i)
    else:
        wrong_translations.append(i)

len(good_translations)  # 53
len(wrong_translations)  # 8

for i in wrong_translations:
    print '%s translated to %s' % (i, Seq(i).translate())

Which consistently gives:

ACN translated to T

TCN translated to S

CTN translated to L

CCN translated to P

CGN translated to R

GTN translated to V

GCN translated to A

GGN translated to G

I don't understand why, out of the 61 possible codons containing uncertainties, 53 of them are correctly translated to 'X', but 8 of them are consistently translated to an actual aminoacid ?! Am I missing something obvious or is this actually a bug or something ? Is there an explanation for this behavior ?

I tried using a custom table with explicit valid codons, none of them containing any "N", but I still have the same problem.

I am using BioPython 1.68.

biopython sequence translation python • 1.3k views

ADD COMMENT • link 5.9 years ago by Palming ▴ 10

score 2 · Accepted Answer · 2018-06-04

2

Entering edit mode

5.9 years ago

Joe 21k

Are you accounting for the fact that some amino acids can be encoded specifically regardless of the the final base?

This is all off the top of my head, but I’m pretty sure there are at least a few cases where the first and second bases are sufficient to determine an amino acid regardless of the final base, for example Gylcine can use any codon of GGU, GGG, GGA, or GGC

ADD COMMENT • link 5.9 years ago by Joe 21k

0

Entering edit mode

Yes you are completely right. I was thinking in a "computer code" manner instead of a "genetic code" manner. So, indeed, whatever the "N", the amino acid still will be the same! It still feels wrong to me that it just automatically translated whether the uncertainty is here or not. But in the end, it's no big deal. Thanks a lot!

ADD REPLY • link 5.9 years ago by Palming ▴ 10

0

Entering edit mode

OK great, I've moved my post to an answer so feel free to accept it if you're happy it's resolved :)

ADD REPLY • link 5.9 years ago by Joe 21k