Question

How to back translate the protein into amino acids sequence with python?

0

Entering edit mode

5.3 years ago

macadamopetrus • 0

Dear all, I am trying to translate protein back into RNA sequences. And when doing so, this kind of back translation gives a lot of RNA sequences.

So for that, I wrote a dictionary of all the possibilities present. And then trying to back translate it with python. But the first problem is I don't want to blow up my computer by printing out the result. For that, I made a mock fasta file containing a mock sequence and try to run this. This file contains MPG and MPPP as protein sequences: -But this doesn't yield the result as PPP 3 times will be seen as 1 time? Suggestions anyone?

Thank u

RNA-Seq protein sequence python • 6.3k views

ADD COMMENT • link 5.3 years ago by macadamopetrus • 0

0

Entering edit mode

Please correct you title, I believe you mean How to back back translate the protein into RNA sequence with python?

Show the code you wrote so far.

ADD REPLY • link 5.3 years ago by h.mon 35k

0

Entering edit mode

ok

My mock protein are in fasta file called HIJ2. fasta. Protein sequences are MPG and MPPP.

    amino_x = {   
    'M': ['GCU', 'GCC', 'GCA'], 
    'P': ['UGU','UGC'],
    'G': ['AAA']
     }

from Bio import SeqIO
for protein in SeqIO.parse("HIJ2.fasta","fasta"): 
    for i in protein:
        amino_acids = list()

        for value in protein:
            amino_acids.append(amino_x[value])
    print(amino_acids)

from itertools import product
    list(itertools.product(amino_acids))

This code is with list.

ADD REPLY • link updated 5.3 years ago by GenoMax 141k • written 5.3 years ago by macadamopetrus • 0

0

Entering edit mode

Sorry, I tried to put the code inside the code kernel but this doesn't work.

ADD REPLY • link 5.3 years ago by macadamopetrus • 0

0

Entering edit mode

Like this?

from Bio import SeqIO
import itertools

amino_x = {
    'M': ['GCU', 'GCC', 'GCA'],
    'P': ['UGU','UGC'],
    'G': ['AAA']
 }

for protein in SeqIO.parse("HIJ2.fasta","fasta"):
    printprotein.id)
    #    ^ there must be a ( after print. Due to a known bug in this forum software it is not shown.
    amino_acids = list()

    for value in protein.seq:
        amino_acids.append(amino_x[value])

    for translate_back in itertools.product(*amino_acids):
        print(translate_back)

Output:

A
('GCU', 'UGU', 'AAA')
('GCU', 'UGC', 'AAA')
('GCC', 'UGU', 'AAA')
('GCC', 'UGC', 'AAA')
('GCA', 'UGU', 'AAA')
('GCA', 'UGC', 'AAA')
B
('UGU', 'UGU', 'UGU')
('UGU', 'UGU', 'UGC')
('UGU', 'UGC', 'UGU')
('UGU', 'UGC', 'UGC')
('UGC', 'UGU', 'UGU')
('UGC', 'UGU', 'UGC')
('UGC', 'UGC', 'UGU')
('UGC', 'UGC', 'UGC')

HIJ2.fasta

>A
MPG
>B
PPP

ADD REPLY • link 5.3 years ago by finswimmer 16k

0

Entering edit mode

Can this code work for longer protein sequences? And how can I add some code that will only print out, lets say 3 rna sequences and not all of them.

ADD REPLY • link 5.3 years ago by macadamopetrus • 0

score 1 · Answer 1 · 2019-01-08

You haven't said what the error was with your code, but the following subtly modified version works for me:

Input:

>Pep1
PPP
>Pep2
MPG

Code:

import sys
from Bio import SeqIO
from itertools import product

amino_x = {'M': ['GCU', 'GCC', 'GCA'],
           'P': ['UGU','UGC'],
           'G': ['AAA']}

for protein in SeqIO.parse(sys.argv[1], "fasta"):
    for i in protein:
        amino_acids = []
        for residue in protein:
            amino_acids.append(amino_x[residue])
    print("{} -> {}".formatprotein.id, amino_acids))

    for a in product(*amino_acids):
        print(a)

Yields:

Pep1 -> [['UGU', 'UGC'], ['UGU', 'UGC'], ['UGU', 'UGC']]
('UGU', 'UGU', 'UGU')
('UGU', 'UGU', 'UGC')
('UGU', 'UGC', 'UGU')
('UGU', 'UGC', 'UGC')
('UGC', 'UGU', 'UGU')
('UGC', 'UGU', 'UGC')
('UGC', 'UGC', 'UGU')
('UGC', 'UGC', 'UGC')
Pep2 -> [['GCU', 'GCC', 'GCA'], ['UGU', 'UGC'], ['AAA']]
('GCU', 'UGU', 'AAA')
('GCU', 'UGC', 'AAA')
('GCC', 'UGU', 'AAA')
('GCC', 'UGC', 'AAA')
('GCA', 'UGU', 'AAA')
('GCA', 'UGC', 'AAA')

Is that not what you were looking for?

Just FYI, for a long sequence, this back translation is going to get huge.

For a peptide of just 19 residues, the run time is already over 5 seconds (there is a reason BioPython doesn't offer back translation natively):

real    0m5.169s
user    0m0.568s
sys     0m0.376s

For 30 residues this climbs to over 48 minutes! - and I have no idea how much longer it would have gone on for, I killed it after I got bored!

  File "backtranslate.py", line 17, in <module>
    print(a)
KeyboardInterrupt

real    48m41.443s
user    9m37.704s
sys     6m38.621s

And this only considering 3 amino acids with a total of 6 codons. This will simply be incalculable if you want to do anything more complicated.

score 0 · Answer 2 · 2019-01-09

0

Entering edit mode

5.3 years ago

finswimmer 16k

You have to break the for after you reached the number of sequences you like to print.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import sys
from argparse import ArgumentParser
from Bio import SeqIO
import itertools


def main():
    args = get_args()

    amino_x = {
        'M': ['GCU', 'GCC', 'GCA'],
        'P': ['UGU','UGC'],
        'G': ['AAA']
    }

    for protein in SeqIO.parse(args.input,"fasta"):
        printprotein.id)
        #    ^ there must be a ( after print. Due to a known bug in this forum software it is not shown.
        amino_acids = list()

        for value in protein.seq:
            amino_acids.append(amino_x[value])

        for i, translate_back in enumerate(itertools.product(*amino_acids)):
            print(translate_back)

            if i == args.number-1:
                break


def get_args():
    parser = ArgumentParser(prog="translate.py", description="")
    parser.add_argument('--version', action='version', version='%(prog)s 0.1')

    parser.add_argument("input", help="input file, use - to read from stdin")
    parser.add_argument("-n", "--number", help="number of translations",  type=int, required=True)
    parser.add_argument("-o", "--output", help="output file")

    if len(sys.argv) == 1:
        parser.print_help(sys.stderr)
        sys.exit(1)

    args = parser.parse_args()

    if args.output:
        sys.stdout = open(args.output, "w")

    if args.input != "-":
        sys.stdin = open(args.input, "r")

    return args


if __name__ == '__main__':
    main()

Example for getting max sequences per protein:

$ python translate.py -n 3 protein.fa

But could you please explain for what you need this?

fin swimmer

ADD COMMENT • link 5.3 years ago by finswimmer 16k

0

Entering edit mode

Thank you, We are doing a project for python and trying to make some codes.

ADD REPLY • link 5.3 years ago by macadamopetrus • 0

0

Entering edit mode

And for that reason you just make someone else do your assignment?

ADD REPLY • link 5.3 years ago by WouterDeCoster 47k

0

Entering edit mode

No, I am not going to use his code I wrote my own. What I do is break down the code to understand it. This is what we wrote.

codons_ = [CODONS[codon] for codon in record.seq] + [CODONS['Stop']]
data[record.id] = ["".join(result) for result in itertools.islice(itertools.product(*codons_, repeat=1),10)]
 print(data)

Have a nice day Wouter.

ADD REPLY • link 5.3 years ago by macadamopetrus • 0

0

Entering edit mode

I think what fin swimmer meant was what is the biological purpose of the code?

Back translation isnt something people really ever do because its computationally massive/slow. There's no code you (or anyone) can write that can make this work fast enough for a 'real' example.

ADD REPLY • link 5.3 years ago by Joe 21k

0

Entering edit mode

Sorry for late reply. I just had a lot of work. Back translation was one of the questions in my project.

For this back translation: We made a dictionary of the possible translations and added a limit to it. Let's say you can add 10, 20.... possibilities. And then this was used to count the nucleotide from the sequence and nucleotide were plotted.

And thank u also for all your help.

ADD REPLY • link 5.3 years ago by macadamopetrus • 0

1

Entering edit mode

I've moved both of the primary comments in this thread to answers of their own to provide some closure to the question. If the answers helped/resolved your issues, be sure to accept at least one (you can accept as many as you like), using the tick mark at the left of the post.

ADD REPLY • link 5.3 years ago by Joe 21k