Python Script To Translate Rna Sequences To Protein Sequences
5
4
Entering edit mode
11.1 years ago
Studentguy ▴ 70

Write a Python script that translates two genes in an RNA sequence into their protein sequence and prints them. Each gene begins with an AUG from the left and ends in UAG and has a length that is a multiple of three. However, the RNA sequence length may not be a multiple of three and there may be more than one "UAG" or "AUG" in the sequence.

For example if the input is

human ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA

with open ("p:/dna.txt", "r") as myfile:

map = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
"UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
"UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
"UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
"CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
"CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
"CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
"CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
"AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
"ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
"AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
"AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
"GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
"GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
"GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
"GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

DNA=data[1]
flag = 1
while flag:
start = DNA.find('AUG')
if start == -1:
flag = 0
else:
done = 0
while done!= 0:
i = start
codon = DNA[1:i+3]
if codon == "UAG":
stop = i
protein = translate(DNA(start))
DNA = DNA[stop:]
done = 1
print(protein)


then the output should be

MLE MY

I have this so far... http://dpaste.org/v2e9/ can anyone help out?

python biopython translation • 55k views
2
Entering edit mode

@Simon: while I sometimes feel irked when I see a question that seems to be taken right out of a homework I think in the end is not our job to police this. Plus we may be wrong in our assumptions. So I would leave this up to everyone's individual opinion on whether they would want to answer it or not. A great solution to an answer lives on and will continue to provide value beyond the original poster's needs.

1
Entering edit mode

I do agree, however there is for that question a partial solution if you follow the link to the OP's 'dpaste' page. Here is a thought. With enough googling, the StudentGuy will come up with an already made up solution anyways, most probably using Biopython, which he will likely not understand and which will be too much high order (using ready made package) to have much teaching value. At least here the OP did a part of the work and is ready to interact with people who likely will teach him something. A better developped question and including the code right here might have been better. Cheers

0
Entering edit mode

I'd be interested in other moderators opinion of homework questions? I think proof of a reasonable stab at a solution would be a good thing, rather than 'do my homework for me' style questions.

0
Entering edit mode

just need to find an otherwise permissive license that prohibits copy-paste use into a homework solution

0
Entering edit mode

@brentp: The people who are copy-pasting homework probably aren't reading enough to look at the licenses anyways.

0
Entering edit mode

some consolation: I appreciates the frankness to write it as HW!!

0
Entering edit mode

I am not really looking for handouts but I am new to programming and am occasionally very confused with python. Most uni's now require all bio majors to be thrown into the pit with a BNFO class since even with limited knowledge in the subject it is a valuable skill to have. that being said all of our teachers here are pretty seasoned vets in several languages and sometimes its hard to connect to "my first programming class"

i posted more below in the answers section :D

3
Entering edit mode
11.1 years ago
brentp 23k

homework? well, i'll answer anyway using biopython.

from Bio.Seq import Seq
from Bio.Alphabet import generic_rna

# add your own logic here to parse the rna sequence from the file.
# split on start codon. drop the part preceding the 1st start codon,
# then for each chunk, translate to the stop codon. then join and print.
print " ".join((str(Seq("AUG" + rest, generic_rna).translate(to_stop=True))
for rest in "ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA".split("AUG")[1:]))

2
Entering edit mode
11.1 years ago

I'm not a python guy but the following script does the job with dna.txt=

>Human
ACAUGCUAGAAUAGCCGCAUGUACUAGUUAA


src: you just need to check if there is at least 3 bases available after the current position:

with open ("dna.txt", "r") as myfile:

map = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
"UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
"UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
"UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
"CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
"CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
"CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
"CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
"AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
"ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
"AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
"AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
"GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
"GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
"GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
"GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

DNA=data[1].strip()
start = DNA.find('AUG')
if start!= -1:
while start+2 < len(DNA):
codon = DNA[start:start+3]
if codon == "UAG": break;
print(map[codon])
start+=3

0
Entering edit mode

hi i ran your code but there seem to be a problem with your code. When i read a data file n characters are also read. And the start variable seem to always get the value -1 even when the sub-string is present in DNA.

0
Entering edit mode

when I run your code I get this error:

start, end = next_transcript(mRNA, cur_pos)
TypeError: 'NoneType' object is not iterable


I also noticed that cur_pos isn't defined until later line 43. Could this be the problem? I am also not sure why in DNA=data[1].strip() you call for the second item 1 in the string? How is your dna.txt formated? This clarification would be much appreciated. Thanks

2
Entering edit mode
11.1 years ago
Studentguy ▴ 70

Thank you all for the fast responses, and to those who help i greatly appreciate it, i ended up with an alternate solution.

I am not really looking for handouts but I am new to programming and am occasionally very confused with python. Most uni's now require all bio majors to be thrown into the pit with a BNFO class since even with limited knowledge in the subject it is a valuable skill to have. that being said all of our teachers here are pretty seasoned vets in several languages and sometimes its hard to connect to "my first programming class"

heres the code i ended up using, it did away with that loop structure i couldnt get and kinda cheats by using find/rfind two narrow down two sequences, it wont work for more than two sequences but it does the job none the less

http://dpaste.org/RyUA/

5
Entering edit mode

Nice work with this. Your logic is sound and the major remaining issue to tackle is repetitiveness in the code. Whenever you repeat code with a few changes, you should focus on making that code into a function. Here's a next version of your code generalized with functions and a while loop. It's a combination of your first and second attempts. Notice how the functions allow you to avoid re-writing the same code multiple times: http://gist.github.com/626765

1
Entering edit mode
8.2 years ago
viv_bio ▴ 50
from Bio import SeqIO , Seq
from Bio.SeqRecord import SeqRecord

def make_trans_record(record):
"Returns a new seqrecord with translated sequences"
return SeqRecord(seq = record.seq[350:-103].translate(),\
id = record.id,\
description = "")

Input = raw_input("Enter File location of the nucleotide sequence :")
Output = raw_input("Output file location and name :")

records = map(make_trans_record,SeqIO.parse(Input,"fasta"))

SeqIO.write(records,Output,"fasta")

0
Entering edit mode
10.1 years ago
User 4133 ▴ 150

Probably this can help.

It is in Italian, but the python code is in English.

Bye Wollo