User's choice for three- or single- letter amino acids in reading frames
Entering edit mode
15 months ago

Hi all, I have written a program that translates a given cDNA sequence into 6 reading frames with three-letter amino acids. I need to change the codes to ask the user if they want to see the translations with three- or single-letter amino acids. for this, I have added another codon table containing the single-letter amino acids. I am a bit confused about how to change the codes; I have tried doing something this but it didn't work:

choice = input(" Single-letter or Three-letter ? type 'S' for single and 'T' for three")
if choice == "s":
# (here I put all the code except the codes for loading FASTA and the sequence and just changed the name of the codon_table which contains the single-letter amino acids)
elif choice == "T":
# (here I put my original codes which contain a different name for the codon_table containing three-letter amino acids)
print("Please choose either 'S' or 'T'.")

Could you please suggest another way of doing it?

Many thanks

Note: Unfortunately, as I will have to submit my project online, I cannot write my codes here.

python • 933 views
Entering edit mode

As long as it gets the job done, your code is OK for learning purposes. Months or years down the line, you'll look back and wonder why you even wrote the code, so don't spend too much time optimizing it. Like Mensur talked about in your earlier post, you won't reinvent the wheel in real life implementations, so it's not a great idea to optimize code when the overall task is kind of useless.

Entering edit mode

but I have to make it work and my code is not working.

Entering edit mode

Ahhh I missed that part, sorry, "it doesn't work" is not enough - where does it fail? What error messages, if any, do you see?

Entering edit mode

This is the error:

in translate
    amino_acid = codon_table[codon]

KeyError: 'gag'

and the codon table is :

codon_table1 = {
        'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
        'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
        'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
        'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',                
        'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
        'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
        'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
        'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
        'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
        'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
        'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
        'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
        'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
        'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
        'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
        'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W',
Entering edit mode

The code needs case sensitive lookup. Fix that and you should be one step closer to solving this problem.

Entering edit mode

thank you that worked but now I have another problem: the length of the reading frames with single-letter amino acids longer than the length of the strands. this is the output:

  E  H  V  G  L  V  L  C  *  V  L  *  S  R  *  E  S  E  G  E  G  L  Q  I  R  P  V  L  S  H  E  A  E  T  L  K  P  D  Y  L  G  P  N  L  G  L  G  I  S  S  L  C  D  S  G  L  R  H  G  A     F1
  S  M  L  A  W  S  F  A  R  Y  C  R  A  G  E  R  V  R  G  K  D  S  K  L  D  Q  F  L  A  M  K  Q  R  L  *  S  Q  T  T  W  V  P  I  L  G  L  V  F  P  R  C  V  T  L  D  C  A  M  G  L    F2
   A  C  W  P  G  P  L  L  G  T  V  E  Q  V  R  E  *  G  G  R  T  P  N  *  T  S  S  *  P  *  S  R  D  S  E  A  R  L  P  G  S  Q  S  W  A  W  Y  F  L  A  V  *  L  W  T  A  

P  W  G  S   F3

   3 gagcatgttggcctggtcctttgctaggtactgtagagcaggtgagagagtgagggggaa   60
   3 ctcgtacaaccggaccaggaaacgatccatgacatctcgtccactctctcactccccctt   60
   R  T  T  G  P  G  N  D  P  *  H  L  V  H  S  L  T  P  P  S  *  G  L  I  W  S  R  I  G  T  S  S  L  R  L  R  S  D  G  P  R  V  R  T  R  T  I  K  E  R  H  T  E  T  *  R  G  T  P  S   F6
  S  Y  N  R  T  R  K  R  S  M  T  S  R  P  L  S  H  S  P  F  L  R  F  N  L  V  K  N  R  Y  F  V  S  E  T  S  V  *  W  T  Q  G  *  N  P  N  H  K  G  A  T  H  *  D  L  T  R  Y  P  E    F5
 L  V  Q  P  D  Q  E  T  I  H  D  I  S  S  T  L  S  L  P  L  P  E  V  *  S  G  Q  E  S  V  L  R  L  *  D  F  G  L  M  D  P  G  L  E  P  E  P  *  R  S  D  T  L  R  P  D  A  V  P  R     F4

and this is the line that prints frame 5:

print('      {:<60}    F5'.format('  '.join(line_protein5)))

can I add anything to this line of code so its length is 60 characters long (including the two spaces between each single-letter amino acid)?

Entering edit mode

Please format your post properly, I cannot understand the problem you have formatting your output if you can't show me the bad output you're getting as well as the output you expect.

Entering edit mode

enter image description here

and I need to get this: enter image description here

Entering edit mode

You need to revisit how you calculate spacing. You're writing fixed width output, so ensure there is sufficient space for various elements and calculate the width, then drill down into individual placements. Start with the known length - 60 for the sequence and AA lines, then calculate spacing for the left and right margin content - the coordinates and the frame-number markers. There are plenty of wrong assumptions in your current code. For example, your F1, F2 and F3 are spaced equally away from their amino acids which should not be the case.

Plus there seems to be a huge gap in your logic - the nucleotide sequence is only 60 bases but the peptide sequence is a lot longer than 20 AAs. Fixing that should probably fix the multi-line AA sequence problem.

These are points you should be observing by comparing your expected and actual outputs, I don't see why I am having to do this for you.


Login before adding your answer.

Traffic: 2506 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6