Question: Translation Of Nucleotide To Amino Acid
3
gravatar for figo
6.5 years ago by
figo200
figo200 wrote:

HI All,

I need to translate the multi fasta file nucleotide sequences to aminoacid into 6 reading frames and select the best reading frame that defines the nucleotide sequences. Do any one of you have any perl script or any tool that can do this will be a great help.

Regards

translation • 13k views
ADD COMMENTlink modified 4 months ago by Kzra10 • written 6.5 years ago by figo200
6
gravatar for JC
6.5 years ago by
JC7.7k
Mexico
JC7.7k wrote:

Emboss has sixpack to do that: http://emboss.sourceforge.net/apps/cvs/emboss/apps/sixpack.html

ADD COMMENTlink written 6.5 years ago by JC7.7k

Hello, the Emboss util doesn't work:

@BioPower3-IBM ~/Bacteria/cufflinks/older $ sixpack -mstart yes -orfminsize 300 -sequence diff_expressed.fasta -outfile diff_expressed.sixpack -outseq diff_expressed_AA.fasta -mstart -orfminsize 50
Display a DNA sequence with 6-frame translation and ORFs
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 20:07:41
@BioPower3-IBM ~/Bacteria/cufflinks/older $ head diff_expressed_AA.fasta
>16451-16691(-)_1_ORF1  Translation of 16451-16691(-) in frame 1, ORF 1, threshold 50, 79aa
MKKLLFLFFALTAFLFGAVNINTATLKELKSLNGIGEAKAKAILEYRKEANFTSIDDLKK
VKGIGDKLFEKIKNDITIE
>16451-16691(-)_3_ORF1  Translation of 16451-16691(-) in frame 3, ORF 1, threshold 50, 37aa
EKITIFIFCFNGFSLWCCKYQHCNTKRIKKFKWYWRS
>16451-16691(-)_4_ORF1  Translation of 16451-16691(-) in frame 4, ORF 1, threshold 50, 36aa
LFYCDIIFDFFKKLITYAFNFFKIINTCKICFFAVF
>16451-16691(-)_5_ORF1  Translation of 16451-16691(-) in frame 5, ORF 1, threshold 50, 3aa
ILL
>16451-16691(-)_6_ORF1  Translation of 16451-16691(-) in frame 6, ORF 1, threshold 50, 64aa

Even though a minimum size for ORFs is set, smaller ORFs are reported, and the first AA is not always M, but it should be.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by apelin20470
1
gravatar for vaskin90
6.5 years ago by
vaskin90280
Milan, Italy
vaskin90280 wrote:

You may use UGENE Workflow Designer. No scripts needed, only GUI manipulations 1) Download UGENE http://ugene.unipro.ru/ 2) Open Workflow Designer from the Tools menu 3) Create a scheme with the following elements: "Read Sequence" -> "Amino Translation" -> "Write sequence".

ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by vaskin90280
1
gravatar for x.jack.min
3.5 years ago by
x.jack.min20
x.jack.min20 wrote:

Try use the orfpredictor - it gives you the best and all 6-frames, and also retrieve the cds.

http://proteomics.ysu.edu/tools/OrfPredictor.html

 

ADD COMMENTlink written 3.5 years ago by x.jack.min20
0
gravatar for Sukhdeep Singh
6.5 years ago by
Sukhdeep Singh9.7k
Netherlands
Sukhdeep Singh9.7k wrote:

Try this python script from this resource, I haven't tested it though.

[translatedna.py]read a fasta file, do 6-frame translation, print best protein seqs as fasta

ADD COMMENTlink written 6.5 years ago by Sukhdeep Singh9.7k
1

beware, "haven't tested" is a big risk in the case there is in fact a mistake or bug. In general, for routine tasks like this, in my opinion no new code should be written at all given the presence of highly stable and well tested code. The reason why I like to stress this is that you can taint all your downstream analysis, by e.g. a typo in the genetic code matrix.

ADD REPLYlink written 6.5 years ago by Michael Dondrup46k

Yeah I agree, but it was point in right direction, that why I mentioned that "I haven't tested the code". The user should at-least bear responsibility for testing and usage which might not be the case if someone is naive in the subject and is unable to test the result

ADD REPLYlink written 6.5 years ago by Sukhdeep Singh9.7k
0
gravatar for Neilfws
6.5 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

There are multiple tools available online to do this. Did you try "translate 6 frames" as a Google search? Here's the top hit.

ADD COMMENTlink written 6.5 years ago by Neilfws48k
0
gravatar for shane.neeley
6.5 years ago by
shane.neeley50
Portland, Oregon
shane.neeley50 wrote:

Here is some python to translate all six frames of DNA.

beans = "TGACTGTGTTTCTGAACAATAAATGACTTAAACCAGGTATGGCTGCCGATGGTTATCTT"

gencode = {
      'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
      'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
      'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
      'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
      'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
      'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
      'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
      'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
      'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
      'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
      'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
      'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
      'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
      'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
      'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
      'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}

def translate_frameshifted( sequence ):
      translate = ''.join([gencode.get(sequence[3*i:3*i+3],'X') for i in range(len(sequence)//3)])
      return translate

print translate_frameshifted(beans[0:])    # first frame
print translate_frameshifted(beans[1:])    # second frame
print translate_frameshifted(beans[2:])    # third frame
print translate_frameshifted(beans[::-1][0:])    # first frame from end
print translate_frameshifted(beans[::-1][1:])    # second frame from end
print translate_frameshifted(beans[::-1][2:])    # third frame from end

# This ::-1 syntax in python means reverse the string.

Result:

_LCF_TINDLNQVWLPMVI
DCVSEQ_MT_TRYGCRWLS
TVFLNNK_LKPGMAADGYL
FYW_PSVWTKFSK_QVFVS
SIGSRRYGPNSVNNKSLCQ
LLVAVGMDQIQ_ITSLCVS
ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by shane.neeley50

This code is nice, but it translates only frames 1 to 3 correctly. For the -1 to -3 frames, the reverse complement DNA is required.

ADD REPLYlink written 3.6 years ago by iontrap.ms0
0
gravatar for iontrap.ms
3.6 years ago by
iontrap.ms0
United States
iontrap.ms0 wrote:

Here is the corrected code from shane.neely, which is otherwise very elegant and useful:

beans = "TGACTGTGTTTCTGAACAATAAATGACTTAAACCAGGTATGGCTGCCGATGGTTATCTT"

gencode = {
      'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
      'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
      'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
      'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
      'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
      'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
      'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
      'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
      'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
      'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
      'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
      'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
      'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
      'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
      'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
      'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}

basepairs = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}

def translate_frameshifted( sequence ):
      translate = ''.join([gencode.get(sequence[3*i:3*i+3],'X') for i in range(len(sequence)//3)])
      return translate

def reverse_complement( sequence ):
      reversed_sequence = (sequence[::-1])
      rc = ''.join([basepairs.get(reversed_sequence[i], 'X') for i in range(len(sequence))])
      return rc

print(translate_frameshifted(beans[0:]))    # first frame
print(translate_frameshifted(beans[1:]))    # second frame
print(translate_frameshifted(beans[2:]))    # third frame
print(translate_frameshifted(reverse_complement(beans)))    # negative first frame
print(translate_frameshifted(reverse_complement(beans[:len(beans)-1])))    # negative second frame
print(translate_frameshifted(reverse_complement(beans[:len(beans)-2])))    # negative third frame

# This ::-1 syntax in python means reverse the string.

 

 

ADD COMMENTlink written 3.6 years ago by iontrap.ms0
0
gravatar for iontrap.ms
3.6 years ago by
iontrap.ms0
United States
iontrap.ms0 wrote:

Result:

_LCF_TINDLNQVWLPMVI
DCVSEQ_MT_TRYGCRWLS
TVFLNNK_LKPGMAADGYL
KITIGSHTWFKSFIVQKHS
R_PSAAIPGLSHLLFRNTV
DNHRQPYLV_VIYCSETQS

ADD COMMENTlink written 3.6 years ago by iontrap.ms0
0
gravatar for Kzra
4 months ago by
Kzra10
University of British Columbia
Kzra10 wrote:

I have written a program in Python 3 that takes a nucleotide FASTA file as input, and translates each sequence in that file in the frame which produces the fewest number of stop codons.

The software transcribes each sequence in all six frames and counts the number of STOP codons in each. It then writes the original sequence in the 'optimal' frame to an output FASTA file, with the frame name appended to the contig name. In cases where there are multiple optimal frames Optimal Translate writes both into the output FASTA file. It only needs Python 3 to run and works on both DNA and RNA sequences.

The software can be run in Python or as a Geneious 11 plugin on Windows.

You can access it here: https://github.com/Kzra/Optimal-Translate

ADD COMMENTlink written 4 months ago by Kzra10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1823 users visited in the last hour