Entering edit mode
4.8 years ago
alsxm9608220073
•
0
I want to create a program that translates DNA sequences into amino acid sequences. So I used Biopython. here's what i have so far
import sys
from Bio import SeqIO
import warnings
warnings.filterwarnings("ignore")
filename = raw_input("Enter filename : ")
myfile = open(filename)
fasta_in=filename
for record in SeqIO.parse(fasta_in,'fasta'):
id_part=record.id
desc_part=record.description
seq = record.seq
print('seq:',seq)
print(seq);
frame_num = raw_input("Enter frame number: ")
if (frame_num[0] == '+'):
for codon in [seq[i:i+3] for i in range(int(frame_num[1])-1, len(seq), 3)]:
print seq.translate()
sys.exit()
else:
for codon in [seq[i-3:i] for i in range(len(seq)-int(frame_num[1])+1, 0, -3)]:
print seq.translate()
sys.exit
I ran this program, but the same amino acid sequence translation was printed at +1 and -2. Like this:
Enter filename : ringo.txt
('seq:', Seq('TTCAGGAGTGGAACGCACGCCAGCGACGTCCAAGAAGCCTTGAAACAGTTCGTC...TTT', SingleLetterAlphabet()))
('seq:', Seq('ATGGGAAGAAGGCGAAGTCATGAGCGCCGGGATTTACCCCCTAACCTTTATATA...TAA', SingleLetterAlphabet()))ATGGGAAGAAGGCGAAGTCATGAGCGCCGGGATTTACCCCCTAACCTTTATATAAGAAACAATGGATATTACTGCTACAGGGACCCAAGGACGGGTAAAGAGTTTGGATTAGGCAGAGACAGGCGAATCGCAATCACTGAAGCTATACAGGCCAACATTGAGTTATTTTCAGGACACAAACACAAGCCTCTGACAGCGAGAATCAACAGTGATAATTCCGTTACGTTACATTCATGGCTTGATCGCTACGAAAAAATCCTGGCCAGCAGAGGAATCAAGCAGAAGACACTCATAAATTACATGAGCAAAATTAAAGCAATAAGGAGGGGTCTGCCTGATGCTCCACTTGAAGACATCACCACAAAAGAAATTGCGGCAATGCTCAATGGATACATAGACGAGGGCAAGGCGGCGTCAGCCAAGTTAATCAGATCAACACTGAGCGATGCATTCCGAGAGGCAATAGCTGAAGGCCATATAACAACAAACCATGTCGCTGCCACTCGCGCAGCAAAATCAGAGGTAAGGAGATCAAGACTTACGGCTGACGAATACCTGAAAATTTATCAAGCAGCAGAATCATCACCATGTTGGCTCAGACTTGCAATGGAACTGGCTGTTGTTACCGGGCAACGAGTTGGTGATTTATGCGAAATGAAGTGGTCTGATATCGTAGATGGATATCTTTATGTCGAGCAAAGCAAAACAGGCGTAAAAATTGCCATCCCAACAGCATTGCATATTGATGCTCTCGGAATATCAATGAAGGAAACACTTGATAAATGCAAAGAGATTCTTGGCGGAGAAACCATAATTGCATCTACTCGTCGCGAACCGCTTTCATCCGGCACAGTATCAAGGTATTTTATGCGCGCACGAAAAGCATCAGGTCTTTCCTTCGAAGGGGATCCGCCTACCTTTCACGAGTTGCGCAGTTTGTCTGCAAGACTCTATGAGAAGCAGATAAGCGATAAGTTTGCTCAACATCTTCTCGGGCATAAGTCGGACACCATGGCATCACAGTATCGTGATGACAGAGGCAGGGAGTGGGACAAAATTGAAATCAAATAA
Enter frame number: +1
MGRRRSHERRDLPPNLYIRNNGYYCYRDPRTGKEFGLGRDRRIAITEAIQANIELFSGHKHKPLTARINSDNSVTLHSWLDRYEKILASRGIKQKTLINYMSKIKAIRRGLPDAPLEDITTKEIAAMLNGYIDEGKAASAKLIRSTLSDAFREAIAEGHITTNHVAATRAAKSEVRRSRLTADEYLKIYQAAESSPCWLRLAMELAVVTGQRVGDLCEMKWSDIVDGYLYVEQSKTGVKIAIPTALHIDALGISMKETLDKCKEILGGETIIASTRREPLSSGTVSRYFMRARKASGLSFEGDPPTFHELRSLSARLYEKQISDKFAQHLLGHKSDTMASQYRDDRGREWDKIEIK*
And '-2'
Enter filename : ringo.txt
('seq:', Seq('TTCAGGAGTGGAACGCACGCCAGCGACGTCCAAGAAGCCTTGAAACAGTTCGTC...TTT', SingleLetterAlphabet()))
('seq:', Seq('ATGGGAAGAAGGCGAAGTCATGAGCGCCGGGATTTACCCCCTAACCTTTATATA...TAA', SingleLetterAlphabet()))ATGGGAAGAAGGCGAAGTCATGAGCGCCGGGATTTACCCCCTAACCTTTATATAAGAAACAATGGATATTACTGCTACAGGGACCCAAGGACGGGTAAAGAGTTTGGATTAGGCAGAGACAGGCGAATCGCAATCACTGAAGCTATACAGGCCAACATTGAGTTATTTTCAGGACACAAACACAAGCCTCTGACAGCGAGAATCAACAGTGATAATTCCGTTACGTTACATTCATGGCTTGATCGCTACGAAAAAATCCTGGCCAGCAGAGGAATCAAGCAGAAGACACTCATAAATTACATGAGCAAAATTAAAGCAATAAGGAGGGGTCTGCCTGATGCTCCACTTGAAGACATCACCACAAAAGAAATTGCGGCAATGCTCAATGGATACATAGACGAGGGCAAGGCGGCGTCAGCCAAGTTAATCAGATCAACACTGAGCGATGCATTCCGAGAGGCAATAGCTGAAGGCCATATAACAACAAACCATGTCGCTGCCACTCGCGCAGCAAAATCAGAGGTAAGGAGATCAAGACTTACGGCTGACGAATACCTGAAAATTTATCAAGCAGCAGAATCATCACCATGTTGGCTCAGACTTGCAATGGAACTGGCTGTTGTTACCGGGCAACGAGTTGGTGATTTATGCGAAATGAAGTGGTCTGATATCGTAGATGGATATCTTTATGTCGAGCAAAGCAAAACAGGCGTAAAAATTGCCATCCCAACAGCATTGCATATTGATGCTCTCGGAATATCAATGAAGGAAACACTTGATAAATGCAAAGAGATTCTTGGCGGAGAAACCATAATTGCATCTACTCGTCGCGAACCGCTTTCATCCGGCACAGTATCAAGGTATTTTATGCGCGCACGAAAAGCATCAGGTCTTTCCTTCGAAGGGGATCCGCCTACCTTTCACGAGTTGCGCAGTTTGTCTGCAAGACTCTATGAGAAGCAGATAAGCGATAAGTTTGCTCAACATCTTCTCGGGCATAAGTCGGACACCATGGCATCACAGTATCGTGATGACAGAGGCAGGGAGTGGGACAAAATTGAAATCAAATAA
Enter frame number: -2
MGRRRSHERRDLPPNLYIRNNGYYCYRDPRTGKEFGLGRDRRIAITEAIQANIELFSGHKHKPLTARINSDNSVTLHSWLDRYEKILASRGIKQKTLINYMSKIKAIRRGLPDAPLEDITTKEIAAMLNGYIDEGKAASAKLIRSTLSDAFREAIAEGHITTNHVAATRAAKSEVRRSRLTADEYLKIYQAAESSPCWLRLAMELAVVTGQRVGDLCEMKWSDIVDGYLYVEQSKTGVKIAIPTALHIDALGISMKETLDKCKEILGGETIIASTRREPLSSGTVSRYFMRARKASGLSFEGDPPTFHELRSLSARLYEKQISDKFAQHLLGHKSDTMASQYRDDRGREWDKIEIK*
I want to make program like this
Ask the user for a file name in FASTA format. Open the file (while checking for errors. Input the frame number (+1, +2, +3, -1, -2, -3) from the user. Output the amino acid sequence when read from the input frame.
I don't know what to fix. If you know a good solution, please let me know.
Hi, is this a homework/assignment question?
Just FYI, OP, I’ve edited your tags (the # are not necessary) to make it easier for people watching the tags to find your post, and I’ve altered your title to be a bit more specific to the question. If you would like to rephrase my change, please feel free, but try to keep the title descriptive of the thread content.
BioPython is smart enough that you don’t need to get your hands quite so dirty with the codons and frames.
There is a whole module of sequence utilities which includes a 6 frame translator, so you can translate everything at once:
https://biopython.org/DIST/docs/api/Bio.SeqUtils-module.html#six_frame_translations
While I like that util, I think its main purpose is to prettify things. I find it difficult to work with the string representation of the translations.
True, though OP could probably extract the relevant bits of the source for their problem. Looping over each triplet individually to translate them doesn’t strike me as efficient/robust/necessary