Entering edit mode
7.8 years ago
oki4
▴
10
Goal: Your task is to write a program to translate a DNA sequence, given in a GenBank file format called sequence.gb, into all six reading frames as output. We are given a template or starting code to work with
GenBank input file: http://web.njit.edu/~kapleau/teach/current/bnfo135/sequence.gb
My code:
from urllib.request import urlopen
## ''' The dna2rna function converts a sequence of DNA, given as a
## parameter and returns an RNA sequence.
## '''
def dna2rna(sequence):
rna_seq = sequence.replace('T', 'U')
return(rna_seq)
codon2aa = {'aaa': 'K', 'aac': 'N', 'aag': 'K', 'aau': 'N',
'aca': 'T', 'acc': 'T', 'acg': 'T', 'acu': 'T',
'aga': 'R', 'agc': 'S', 'agg': 'R', 'agu': 'S',
'aua': 'I', 'auc': 'I', 'aug': 'M', 'auu': 'I',
'caa': 'Q', 'cac': 'H', 'cag': 'Q', 'cau': 'H',
'cca': 'P', 'ccc': 'P', 'ccg': 'P', 'ccu': 'P',
'cga': 'R', 'cgc': 'R', 'cgg': 'R', 'cgu': 'R',
'cua': 'L', 'cuc': 'L', 'cug': 'L', 'cuu': 'L',
'gaa': 'E', 'gac': 'D', 'gag': 'E', 'gau': 'D',
'gca': 'A', 'gcc': 'A', 'gcg': 'A', 'gcu': 'A',
'gga': 'G', 'ggc': 'G', 'ggg': 'G', 'ggu': 'G',
'gua': 'V', 'guc': 'V', 'gug': 'V', 'guu': 'V',
'uaa': '_', 'uac': 'Y', 'uag': '_', 'uau': 'Y',
'uca': 'S', 'ucc': 'S', 'ucg': 'S', 'ucu': 'S',
'uga': '_', 'ugc': 'C', 'ugg': 'W', 'ugu': 'C',
'uua': 'L', 'uuc': 'F', 'uug': 'L', 'uuu': 'F'}
if __name__ == '__main__':
with urlopen('https://web.njit.edu/~kapleau/teach/current/bnfo135/sequence.gb') as conn:
data = conn.readlines()
lines = [line.strip() for line in [datum.decode() for datum in data]]
flag = False
dna = ''
for line in lines:
## if the flag is 'True', append the line to 'dna'.
if flag == True:
dna.append(line)
## if the word "ORIGIN" is in the line, set 'flag' to 'True'
if 'ORIGIN' in line:
flag = True
pass
## gets rid of any non-dna character.
dna = dna.translate(str.maketrans('acgt', 'acgt', '0123456789 /'))
## calls the dna2rna function
rna = dna2rna(dna)
**## process the first 3 reading frames
for i in range(3):
if rna[0:3] in codon2aa:**
## create a variable 'seq' and assign it the rna to process
seq = ''
amino = ''
while len(seq) >= 3:
## use the codon2aa table to append an amino acid to 'amino'
## update 'seq' to the next codon
pass
print('--- Reading Frame %i ---' % (i+1), amino, sep='\n')
##
## ## compute the reverse complement of 'rna' and assign the result
## ## back into the 'rna' variable
##
## ## process the next 3 reading frames. hint: just like the first 3
## for i in range(3):
## ## same as the first 3
## print('--- Reading Frame %i ---' % (i+4), amino, sep='\n')
##
I would like to know if I'm on the correct path so far. Also I'm having trouble, processing the 3 reading frames (bolded section), and would like some input. Thanks.
Have you been instructed not to use a library like biopython?
This can be accomplished pretty easily with
SeqIO
with the builtintranslate()
from biopython.No I can't use BioPython unfortunately.
Hi,
Did you ever find an answer to the project?
The answer is probably here: C: Beginner in Python- translating DNA given in GenBank file format into its six re
Is that hyperlink meant to take me back to this page?
It takes you to the comment made by Eric Lim, suggesting you to use
translate()
from Biopython.The biopython cookbook actually shows how to do this but instead of translate you could just call your table instead.