Question: How Do I Translate Multiple (More Than 25000) Dna Sequences With Different Frames To Protein Seuquence?
1
gravatar for biostar
5.7 years ago by
biostar150
biostar150 wrote:

How can I translate multiple (more than 25000) DNA sequences with different frames to protein sequence? Is there any program or perl script I can use to do that? I am also not sure if can can include all the sequences with their frame for translation all at the same time. Please share any information on this. Thanks!

protein dna • 12k views
ADD COMMENTlink modified 9 weeks ago by kabir.deb03530 • written 5.7 years ago by biostar150

Thanks guys! @Pavel, I think it allows me to submit same frame for multiple sequences, but how do I include multiple sequences with different frames all as one batch submission? Also, Is there a way to omit the sequences with stop codons in the frames for translation?? @Biolab, so the script you mentioned only works for frame 1? How do I translate other frames, ? Sorry I am novice in perl.. Thanks a bunch!

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by biostar150
1

These are good questions, you need to do some work for that - extract/organize sequences, compose proper command lines etc. For sixpack you'll need to pre-process/split your dataset - sixpack will extract all of the possible ORFs from a single sequence and allows customization of that process, transeq will just translate the whole batch placing stops * so you'll need to do post-processing. If ORFs positions are known, then transeq can take in the coordinates and translate.

ADD REPLYlink written 5.7 years ago by Pavel Senin1.9k
1

Hi Youwanpras, I am also a perl beginner. I write a script as follows. It works, but you'd better test yourself. You need to pay attention that each sequence should be in single line (not sure how to improve it). My script is not consice, it will be helpful to ask others in BIOSTARS, as many experts are here. Hope it helps!

  #!/bin/perl
use strict;
use warnings;

local $/ = "\r\n";

my @frames = (1, 2, 3, -1, -2, -3 );  #six frames;
foreach my $frame (@frames) {
frame ($frame);

sub frame  
{
my $f = shift;
open IN, $ARGV[0];

while (<IN>){
    chomp;      #each sequence should be in single line;
    my $m = length ($_);

    if(/^>(\w+)/){
     print ">$1"."_"."frame"."$f"."\n";
     }elsif($f > 0){
        my $frameseq = substr($_, $f-1, $m-$f+1);
        print "$frameseq\n";
    }elsif ($f <0){
        my $comprevseq = reverse $_;
        $comprevseq =~ tr/[A,T,C,G,a,t,c,g]/[T,A,G,C,t,a,g,c]/;  # sequence reverse complement; 
        my $frameseq = substr ($comprevseq, abs($f)-1, $m-abs($f)+1);

        print "$frameseq\n";
    }
}
}
close IN;
}
ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by biolab1.1k

I am trying to translate DNA sequence with your code but it is giving following warning messages

use of uninitialized value $_ in scaler chomp at .....
use of uninitialized value $_ in pattern match (m//) at .....
use of uninitialized value $m in subtraction (-) at .....
use of uninitialized value $_ in substr at .....

Could you please help to fix it.

ADD REPLYlink written 3.4 years ago by tcf.hcdg60
7
gravatar for Pavel Senin
5.7 years ago by
Pavel Senin1.9k
Los Alamos, NM
Pavel Senin1.9k wrote:

EMBOSS, can do it really fast.

1. sixpack

sixpack reads a DNA sequence and writes an output file giving out the forward and reverse sense sequences with the three forward and (optionally) three reverse translations in a pretty display format. A genetic code may be specified for the translation. There are various options to control the appearance of the output file. It also writes a file of protein sequences corresponding to any open reading frames that are larger than the specified minimum size: the default of 1 base shows all possible open reading frames.

2. transeq

transeq reads one or more nucleotide sequences and writes the corresponding protein sequence translations to file. It can translate in any of the 3 forward or three reverse sense frames, or in all three forward or reverse frames, or in all six frames. The translation may be restricted to specified regions, for example, corresponding to the coding regions of your sequences. It can translate using the standard ('Universal') genetic code and also with a selection of non-standard codes.
ADD COMMENTlink written 5.7 years ago by Pavel Senin1.9k

FYI the current EMBOSS documentation and downloads can be found at: http://emboss.open-bio.org/. The old EMBOSS SourceForge site is obsolete.

EMBOSS contains a number of programs related to sequence translation (see B.6.25. Applications in group Nucleic:translation) and gene/ORF finding (see B.6.17. Applications in group Nucleic:gene finding).

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Hamish3.1k

the obsoletness is not mentioned anywhere on sourceforge, while many links, indeed, point onto openbio, how do you know that it is obsolete?

ADD REPLYlink written 5.7 years ago by Pavel Senin1.9k

From discussions with the EMBOSS developers.

The emboss.open-bio.org site is based on the content written for the EMBOSS books. The aim is to maintain the book content through updates to the emboss.open-bio.org site. While the content generated from the EMBOSS sources has been updated at SourceForge, the rest of the content is severely out of date, incomplete and occasionally misleading.

ADD REPLYlink written 5.7 years ago by Hamish3.1k
1
gravatar for biolab
5.7 years ago by
biolab1.1k
biolab1.1k wrote:

Following is a script for frame +1 translation. It uses Bio::SeqIO module in Bioperl.

sub TranslateDNAFile()
    {
      use Bio::SeqIO;
      (my $infile,my $outfile)=@_;
    my $in=Bio::SeqIO->new(-file=>"$infile",-format=>"fasta");
    my $out=Bio::SeqIO->new(-file=>">$outfile", -format=>"fasta");

        while (my $seq=$in->next_seq())
            {
                $out->write_seq($seq->translate);
            }
    }

my $DNAfile="dna.fasta"; 
my $pepfile="pep.fasta"; 
&TranslateDNAFile($DNAfile,$pepfile);
ADD COMMENTlink written 5.7 years ago by biolab1.1k
1
gravatar for x.jack.min
3.9 years ago by
x.jack.min20
x.jack.min20 wrote:

try -

http://proteomics.ysu.edu/tools/OrfPredictor.html

ADD COMMENTlink written 3.9 years ago by x.jack.min20
0
gravatar for Kzra
9 months ago by
Kzra30
University of British Columbia
Kzra30 wrote:

I have written a program in Python 3 that takes a nucleotide FASTA file as input, and translates each sequence in that file in the frame which produces the fewest number of stop codons.

The software transcribes each sequence in all six frames and counts the number of STOP codons in each. It then writes the original sequence in the 'optimal' frame to an output FASTA file, with the frame name appended to the contig name. In cases where there are multiple optimal frames Optimal Translate writes both into the output FASTA file. It only needs Python 3 to run and works on both DNA and RNA sequences.

You can access it here: https://github.com/Kzra/Optimal-Translate

ADD COMMENTlink written 9 months ago by Kzra30
0
gravatar for kabir.deb0353
9 weeks ago by
kabir.deb03530 wrote:

Hello apology for late reply,

I often use translatorX for sequence alignment where you can get aligned protein sequence as well as DNA sequence alignment.

Thanks.

ADD COMMENTlink modified 9 weeks ago by ATpoint23k • written 9 weeks ago by kabir.deb03530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1561 users visited in the last hour