Translation Of Nucleotide To Amino Acid
9
3
Entering edit mode
10.1 years ago
figo ▴ 220

HI All,

I need to translate the multi fasta file nucleotide sequences to aminoacid into 6 reading frames and select the best reading frame that defines the nucleotide sequences. Do any one of you have any perl script or any tool that can do this will be a great help.

Regards

translation • 20k views
ADD COMMENT
6
Entering edit mode
10.1 years ago
JC 13k

Emboss has sixpack to do that: http://emboss.sourceforge.net/apps/cvs/emboss/apps/sixpack.html

ADD COMMENT
0
Entering edit mode

Hello, the Emboss util doesn't work:

@BioPower3-IBM ~/Bacteria/cufflinks/older $ sixpack -mstart yes -orfminsize 300 -sequence diff_expressed.fasta -outfile diff_expressed.sixpack -outseq diff_expressed_AA.fasta -mstart -orfminsize 50
Display a DNA sequence with 6-frame translation and ORFs
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 20:07:41
@BioPower3-IBM ~/Bacteria/cufflinks/older $ head diff_expressed_AA.fasta
>16451-16691(-)_1_ORF1  Translation of 16451-16691(-) in frame 1, ORF 1, threshold 50, 79aa
MKKLLFLFFALTAFLFGAVNINTATLKELKSLNGIGEAKAKAILEYRKEANFTSIDDLKK
VKGIGDKLFEKIKNDITIE
>16451-16691(-)_3_ORF1  Translation of 16451-16691(-) in frame 3, ORF 1, threshold 50, 37aa
EKITIFIFCFNGFSLWCCKYQHCNTKRIKKFKWYWRS
>16451-16691(-)_4_ORF1  Translation of 16451-16691(-) in frame 4, ORF 1, threshold 50, 36aa
LFYCDIIFDFFKKLITYAFNFFKIINTCKICFFAVF
>16451-16691(-)_5_ORF1  Translation of 16451-16691(-) in frame 5, ORF 1, threshold 50, 3aa
ILL
>16451-16691(-)_6_ORF1  Translation of 16451-16691(-) in frame 6, ORF 1, threshold 50, 64aa

Even though a minimum size for ORFs is set, smaller ORFs are reported, and the first AA is not always M, but it should be.

ADD REPLY
1
Entering edit mode
10.1 years ago
vaskin90 ▴ 290

You may use UGENE Workflow Designer. No scripts needed, only GUI manipulations 1) Download UGENE http://ugene.unipro.ru/ 2) Open Workflow Designer from the Tools menu 3) Create a scheme with the following elements: "Read Sequence" -> "Amino Translation" -> "Write sequence".

ADD COMMENT
1
Entering edit mode
7.2 years ago
iontrap.ms ▴ 10

Here is the corrected code from shane.neely, which is otherwise very elegant and useful:

beans = "TGACTGTGTTTCTGAACAATAAATGACTTAAACCAGGTATGGCTGCCGATGGTTATCTT"

gencode = {
      'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
      'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
      'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
      'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
      'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
      'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
      'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
      'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
      'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
      'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
      'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
      'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
      'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
      'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
      'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
      'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}

basepairs = {'A':'T', 'C':'G', 'G':'C', 'T':'A'}

def translate_frameshifted( sequence ):
      translate = ''.join([gencode.get(sequence[3*i:3*i+3],'X') for I in range(len(sequence)//3)])
      return translate

def reverse_complement( sequence ):
      reversed_sequence = (sequence[::-1])
      rc = ''.join([basepairs.get(reversed_sequence[i], 'X') for I in range(len(sequence))])
      return rc

print(translate_frameshifted(beans[0:]))    # first frame
print(translate_frameshifted(beans[1:]))    # second frame
print(translate_frameshifted(beans[2:]))    # third frame
print(translate_frameshifted(reverse_complement(beans)))    # negative first frame
print(translate_frameshifted(reverse_complement(beans[:len(beans)-1])))    # negative second frame
print(translate_frameshifted(reverse_complement(beans[:len(beans)-2])))    # negative third frame

# This ::-1 syntax in python means reverse the string.
ADD COMMENT
1
Entering edit mode
7.1 years ago
x.jack.min ▴ 20

Try use the orfpredictor - it gives you the best and all 6-frames, and also retrieve the cds.

http://proteomics.ysu.edu/tools/OrfPredictor.html

ADD COMMENT
0
Entering edit mode
10.1 years ago

Try this python script from this resource, I haven't tested it though.

[translatedna.py]read a fasta file, do 6-frame translation, print best protein seqs as fasta

ADD COMMENT
1
Entering edit mode

beware, "haven't tested" is a big risk in the case there is in fact a mistake or bug. In general, for routine tasks like this, in my opinion no new code should be written at all given the presence of highly stable and well tested code. The reason why I like to stress this is that you can taint all your downstream analysis, by e.g. a typo in the genetic code matrix.

ADD REPLY
0
Entering edit mode

Yeah I agree, but it was point in right direction, that why I mentioned that "I haven't tested the code". The user should at-least bear responsibility for testing and usage which might not be the case if someone is naive in the subject and is unable to test the result

ADD REPLY
0
Entering edit mode
10.1 years ago
Neilfws 49k

There are multiple tools available online to do this. Did you try "translate 6 frames" as a Google search? Here's the top hit.

ADD COMMENT
0
Entering edit mode
10.1 years ago
shane.neeley ▴ 50

Here is some python to translate all six frames of DNA.

beans = "TGACTGTGTTTCTGAACAATAAATGACTTAAACCAGGTATGGCTGCCGATGGTTATCTT"

gencode = {
      'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
      'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
      'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
      'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
      'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
      'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
      'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
      'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
      'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
      'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
      'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
      'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
      'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
      'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
      'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
      'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}

def translate_frameshifted( sequence ):
      translate = ''.join([gencode.get(sequence[3*i:3*i+3],'X') for i in range(len(sequence)//3)])
      return translate

print translate_frameshifted(beans[0:])    # first frame
print translate_frameshifted(beans[1:])    # second frame
print translate_frameshifted(beans[2:])    # third frame
print translate_frameshifted(beans[::-1][0:])    # first frame from end
print translate_frameshifted(beans[::-1][1:])    # second frame from end
print translate_frameshifted(beans[::-1][2:])    # third frame from end

# This ::-1 syntax in python means reverse the string.

Result:

_LCF_TINDLNQVWLPMVI
DCVSEQ_MT_TRYGCRWLS
TVFLNNK_LKPGMAADGYL
FYW_PSVWTKFSK_QVFVS
SIGSRRYGPNSVNNKSLCQ
LLVAVGMDQIQ_ITSLCVS
ADD COMMENT
0
Entering edit mode

This code is nice, but it translates only frames 1 to 3 correctly. For the -1 to -3 frames, the reverse complement DNA is required.

ADD REPLY
0
Entering edit mode
7.2 years ago
iontrap.ms ▴ 10

Result:

_LCF_TINDLNQVWLPMVI
DCVSEQ_MT_TRYGCRWLS
TVFLNNK_LKPGMAADGYL
KITIGSHTWFKSFIVQKHS
R_PSAAIPGLSHLLFRNTV
DNHRQPYLV_VIYCSETQS

ADD COMMENT
0
Entering edit mode
3.9 years ago
Kzra ▴ 40

I have written a program in Python 3 that takes a nucleotide FASTA file as input, and translates each sequence in that file in the frame which produces the fewest number of stop codons.

The software transcribes each sequence in all six frames and counts the number of STOP codons in each. It then writes the original sequence in the 'optimal' frame to an output FASTA file, with the frame name appended to the contig name. In cases where there are multiple optimal frames Optimal Translate writes both into the output FASTA file. It only needs Python 3 to run and works on both DNA and RNA sequences.

The software can be run in Python or as a Geneious 11 plugin on Windows.

You can access it here: https://github.com/Kzra/Optimal-Translate

ADD COMMENT

Login before adding your answer.

Traffic: 1517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6