Why are some codons containing uncertainties wrongly translated in BioPython?
1
0
Entering edit mode
5.9 years ago
Palming ▴ 10

I recently noticed that some of my DNA sequences were wrongly translated in BioPython, and more precisely, it affects some codons containing uncertainties but not all of them.

To reproduce my problem, you can use the following code:

from Bio.Seq import Seq

wrong_translations = []
good_translations = []

uncertainties = ['AAN', 'ATN', 'ACN', 'AGN', 'ANA', 'ANT', 'ANC', 'ANG', 'ANN', 'TAN', 'TTN', 'TCN', 'TGN', 'TNA', 'TNT', 'TNC', 'TNG', 'TNN', 'CAN', 'CTN', 'CCN', 'CGN', 'CNA', 'CNT', 'CNC', 'CNG', 'CNN', 'GAN', 'GTN', 'GCN', 'GGN', 'GNA', 'GNT', 'GNC', 'GNG', 'GNN', 'NAA', 'NAT', 'NAC', 'NAG', 'NAN', 'NTA', 'NTT', 'NTC', 'NTG', 'NTN', 'NCA', 'NCT', 'NCC', 'NCG', 'NCN', 'NGA', 'NGT', 'NGC', 'NGG', 'NGN', 'NNA', 'NNT', 'NNC', 'NNG', 'NNN']

for i in uncertainties:
    translation = Seq(i).translate()
    if translation == 'X':
        good_translations.append(i)
    else:
        wrong_translations.append(i)

len(good_translations)  # 53
len(wrong_translations)  # 8

for i in wrong_translations:
    print '%s translated to %s' % (i, Seq(i).translate())

Which consistently gives:

ACN translated to T

TCN translated to S

CTN translated to L

CCN translated to P

CGN translated to R

GTN translated to V

GCN translated to A

GGN translated to G

I don't understand why, out of the 61 possible codons containing uncertainties, 53 of them are correctly translated to 'X', but 8 of them are consistently translated to an actual aminoacid ?! Am I missing something obvious or is this actually a bug or something ? Is there an explanation for this behavior ?

I tried using a custom table with explicit valid codons, none of them containing any "N", but I still have the same problem.

I am using BioPython 1.68.

biopython sequence translation python • 1.3k views
ADD COMMENT
2
Entering edit mode
5.9 years ago
Joe 21k

Are you accounting for the fact that some amino acids can be encoded specifically regardless of the the final base?

This is all off the top of my head, but I’m pretty sure there are at least a few cases where the first and second bases are sufficient to determine an amino acid regardless of the final base, for example Gylcine can use any codon of GGU, GGG, GGA, or GGC

ADD COMMENT
0
Entering edit mode

Yes you are completely right. I was thinking in a "computer code" manner instead of a "genetic code" manner. So, indeed, whatever the "N", the amino acid still will be the same! It still feels wrong to me that it just automatically translated whether the uncertainty is here or not. But in the end, it's no big deal. Thanks a lot!

ADD REPLY
0
Entering edit mode

OK great, I've moved my post to an answer so feel free to accept it if you're happy it's resolved :)

ADD REPLY

Login before adding your answer.

Traffic: 3832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6