DNA to protein in python
1
0
Entering edit mode
7.5 years ago
ashkan ▴ 160

Hi Guys,

I have a list of DNA sequences like this: very small example:

> seq = ['ATGGCGGCGCGA', 'GCCTCTGCCTTG', 'CTGAAAACG']

and if you divide the number of characters in each sequence by 3 you would get even number. I also have this dictionary which is codons and amino acids.

gencode = {
    'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M', 'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
    'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K', 'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
    'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L', 'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
    'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q', 'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
    'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V', 'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
    'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E', 'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
    'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S', 'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
    'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_', 'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}

I want replace each codn (3 characters) with its amino acid(its value in the above dictionary). the results for the small example would be like this:

AA : ['MAAR', 'ASAL', 'LKT']

do you guys know how to do that?

next-gen • 4.5k views
ADD COMMENT
1
Entering edit mode
7.5 years ago

Biopython has a translate function, so you don't have to code this yourself. Googling will help you, e.g. http://biopython.org/wiki/Seq and http://biopython.org/DIST/docs/tutorial/Tutorial.html

ADD COMMENT

Login before adding your answer.

Traffic: 2002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6