Question: Python Script To Translate Rna Sequences To Protein Sequences
4
gravatar for Studentguy
10.0 years ago by
Studentguy70
Studentguy70 wrote:

Write a Python script that translates two genes in an RNA sequence into their protein sequence and prints them. Each gene begins with an AUG from the left and ends in UAG and has a length that is a multiple of three. However, the RNA sequence length may not be a multiple of three and there may be more than one "UAG" or "AUG" in the sequence.

For example if the input is

human ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA

with open ("p:/dna.txt", "r") as myfile:
    data=myfile.readlines()

map = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
    "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
    "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
    "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
    "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
    "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
    "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
    "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
    "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
    "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
    "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
    "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
    "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
    "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
    "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
    "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

DNA=data[1]
flag = 1
while flag:
    start = DNA.find('AUG')
    if start == -1:
        flag = 0
    else:
        done = 0
        while done!= 0:
            i = start
            codon = DNA[1:i+3]
            if codon == "UAG":
                stop = i
                protein = translate(DNA(start))
                DNA = DNA[stop:]
                done = 1
                print(protein)

then the output should be

MLE MY

I have this so far... http://dpaste.org/v2e9/ can anyone help out?

python translation biopython • 47k views
ADD COMMENTlink modified 7.1 years ago by viv_bio30 • written 10.0 years ago by Studentguy70
2

@Simon: while I sometimes feel irked when I see a question that seems to be taken right out of a homework I think in the end is not our job to police this. Plus we may be wrong in our assumptions. So I would leave this up to everyone's individual opinion on whether they would want to answer it or not. A great solution to an answer lives on and will continue to provide value beyond the original poster's needs.

ADD REPLYlink written 10.0 years ago by Istvan Albert ♦♦ 85k
1

I do agree, however there is for that question a partial solution if you follow the link to the OP's 'dpaste' page. Here is a thought. With enough googling, the StudentGuy will come up with an already made up solution anyways, most probably using Biopython, which he will likely not understand and which will be too much high order (using ready made package) to have much teaching value. At least here the OP did a part of the work and is ready to interact with people who likely will teach him something. A better developped question and including the code right here might have been better. Cheers

ADD REPLYlink written 10.0 years ago by Eric Normandeau10k

I'd be interested in other moderators opinion of homework questions? I think proof of a reasonable stab at a solution would be a good thing, rather than 'do my homework for me' style questions.

ADD REPLYlink written 10.0 years ago by Simon Cockell7.3k

just need to find an otherwise permissive license that prohibits copy-paste use into a homework solution

ADD REPLYlink written 10.0 years ago by brentp23k

@brentp: The people who are copy-pasting homework probably aren't reading enough to look at the licenses anyways.

ADD REPLYlink written 10.0 years ago by Will4.5k

some consolation: I appreciates the frankness to write it as HW!!

ADD REPLYlink written 10.0 years ago by Rm8.0k

I am not really looking for handouts but I am new to programming and am occasionally very confused with python. Most uni's now require all bio majors to be thrown into the pit with a BNFO class since even with limited knowledge in the subject it is a valuable skill to have. that being said all of our teachers here are pretty seasoned vets in several languages and sometimes its hard to connect to "my first programming class"

i posted more below in the answers section :D

ADD REPLYlink written 10.0 years ago by Studentguy70
3
gravatar for brentp
10.0 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

homework? well, i'll answer anyway using biopython.

from Bio.Seq import Seq
from Bio.Alphabet import generic_rna

# add your own logic here to parse the rna sequence from the file.
# split on start codon. drop the part preceding the 1st start codon, 
# then for each chunk, translate to the stop codon. then join and print.
print " ".join((str(Seq("AUG" + rest, generic_rna).translate(to_stop=True))
               for rest in "ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA".split("AUG")[1:]))
ADD COMMENTlink modified 13 months ago by RamRS30k • written 10.0 years ago by brentp23k
2
gravatar for Pierre Lindenbaum
10.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

I'm not a python guy but the following script does the job with dna.txt=

>Human
ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA

src: you just need to check if there is at least 3 bases available after the current position:

with open ("dna.txt", "r") as myfile:
    data=myfile.readlines()

map = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
       "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
       "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
       "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
       "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
       "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
       "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
       "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
       "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
       "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
       "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
       "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
       "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
       "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
       "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
       "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

DNA=data[1].strip()
start = DNA.find('AUG')
if start!= -1:
    while start+2 < len(DNA):
        codon = DNA[start:start+3]
        if codon == "UAG": break;
        print(map[codon])
        start+=3
ADD COMMENTlink modified 13 months ago by RamRS30k • written 10.0 years ago by Pierre Lindenbaum131k

hi i ran your code but there seem to be a problem with your code. When i read a data file n characters are also read. And the start variable seem to always get the value -1 even when the sub-string is present in DNA.

ADD REPLYlink written 8.9 years ago by User 5037290

when I run your code I get this error:

start, end = next_transcript(mRNA, cur_pos)
TypeError: 'NoneType' object is not iterable

I also noticed that cur_pos isn't defined until later line 43. Could this be the problem? I am also not sure why in DNA=data[1].strip() you call for the second item 1 in the string? How is your dna.txt formated? This clarification would be much appreciated. Thanks

ADD REPLYlink modified 13 months ago by RamRS30k • written 7.6 years ago by dolbema0
2
gravatar for Studentguy
10.0 years ago by
Studentguy70
Studentguy70 wrote:

Thank you all for the fast responses, and to those who help i greatly appreciate it, i ended up with an alternate solution.

I am not really looking for handouts but I am new to programming and am occasionally very confused with python. Most uni's now require all bio majors to be thrown into the pit with a BNFO class since even with limited knowledge in the subject it is a valuable skill to have. that being said all of our teachers here are pretty seasoned vets in several languages and sometimes its hard to connect to "my first programming class"

heres the code i ended up using, it did away with that loop structure i couldnt get and kinda cheats by using find/rfind two narrow down two sequences, it wont work for more than two sequences but it does the job none the less

http://dpaste.org/RyUA/

ADD COMMENTlink written 10.0 years ago by Studentguy70
5

Nice work with this. Your logic is sound and the major remaining issue to tackle is repetitiveness in the code. Whenever you repeat code with a few changes, you should focus on making that code into a function. Here's a next version of your code generalized with functions and a while loop. It's a combination of your first and second attempts. Notice how the functions allow you to avoid re-writing the same code multiple times: http://gist.github.com/626765

ADD REPLYlink written 10.0 years ago by Brad Chapman9.5k
0
gravatar for User 4133
9.0 years ago by
User 4133150
User 4133150 wrote:

Probably this can help.

It is in Italian, but the python code is in English.

Bye Wollo

ADD COMMENTlink modified 13 months ago by RamRS30k • written 9.0 years ago by User 4133150
0
gravatar for viv_bio
7.1 years ago by
viv_bio30
IIT Bombay
viv_bio30 wrote:
from Bio import SeqIO , Seq
from Bio.SeqRecord import SeqRecord

def make_trans_record(record):
    "Returns a new seqrecord with translated sequences"
    return SeqRecord(seq = record.seq[350:-103].translate(),\
                     id = record.id,\
                     description = "")

Input = raw_input("Enter File location of the nucleotide sequence :")
Output = raw_input("Output file location and name :")

records = map(make_trans_record,SeqIO.parse(Input,"fasta"))

SeqIO.write(records,Output,"fasta")
ADD COMMENTlink modified 13 months ago by RamRS30k • written 7.1 years ago by viv_bio30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1006 users visited in the last hour