Question: How Do I Translate Different Regions Of My Sequence In Different Frames?
gravatar for Gimly_Gloin
7.9 years ago by
Gimly_Gloin70 wrote:

What I want to do is say something like in sequence D translate region 1-100 in frame 1 from 102-150 in frame 3 from 160-180 frame -3

I would typically extract this information from a blastx table and (the idea I had) feed it into EMBOSS's transeq... However transeq seems to mess up which frame does what, and even in the help it says (for the -region function) :

"Note: you should not try to use this option with any other frame than the default, -frame=1"

I wonder if there is any software that can do this? I've looked at bioperl, and haven't seen anything that caught my eye .... ? thoughts?

translation bioperl blast • 2.8k views
ADD COMMENTlink modified 4.4 years ago by Biostar ♦♦ 20 • written 7.9 years ago by Gimly_Gloin70
gravatar for Neilfws
7.9 years ago by
Sydney, Australia
Neilfws48k wrote:

I think you can use the Bioperl methods subseq, translate and revcom to achieve what you want. A brief example using this test sequence, saved as nirs.fa:

#!/usr/bin/perl -w 
use strict;
use Bio::SeqIO;

my $file  = "nirs.fa";
my $seqio = Bio::SeqIO->new(-file => "$file", -format => "fasta");
my $seqi  = $seqio->next_seq;

my $s1 = $seqi->subseq(1,100);
my $s2 = $seqi->subseq(102,150);

# apologies for chaining together these methods...
# frame 1: in Bioperl frames 1,2,3 = 0,1,2
print Bio::Seq->new(-seq => $s1)->translate(-frame => 0)->seq, "\n";
# frame -3: we reverse complement, then frame -3 = Bioperl frame 2
print Bio::Seq->new(-seq => $s2)->revcom->translate(-frame => 2)->seq, "\n";

I too am somewhat confused by EMBOSS transeq (but I think it is doing what you want). Here are the equivalent commands, with results:

transeq -sequence nirs.fa -frame 1 -regions 1:100 -stdout -auto
# >2576786-2578450_1 Ralstonia eutropha H16 chromosome 2, complete sequence

transeq -sequence nirs.fa -frame -3 -regions 102:150 -stdout -auto             
# >2576786-2578450_6 Ralstonia eutropha H16 chromosome 2, complete sequence

So transeq seems to be returning one residue more. I'm sure there's a good explanation if we think about it. Maybe the Bioperl method assumes that the last 3 bases are a terminator unless told otherwise.

ADD COMMENTlink modified 7.9 years ago • written 7.9 years ago by Neilfws48k

I get the correct frames from parsing my blastx output by printing out $hsp->hitstring or $hsp->querystring.thats not the issue what I would like is to adjust that information from the blast output so that instead of getting 5 or 6 fragments of the protein I get a single string that is correct for the protein in the several frames blastx has detected. I thought I could do this with transec, but for any given information in the blast table it shows a different result to the transec output (I've also corrected the GFF frame() used in SearchIO to show the blastX frame).Is blast skipping bases?

ADD REPLYlink written 7.9 years ago by Gimly_Gloin70

Also it doesn't allow multiple commands

ADD REPLYlink written 7.9 years ago by Gimly_Gloin70

I think your probably right about transeq though... too much "intuition"

ADD REPLYlink written 7.9 years ago by Gimly_Gloin70

I would not use the BLAST output for this task. Since BLAST tells you the frames, why not just go back to the original input sequence? As your question says: "How do I translate different regions of my sequence in different frames."

ADD REPLYlink written 7.9 years ago by Neilfws48k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1070 users visited in the last hour