How Do I Translate Different Regions Of My Sequence In Different Frames?
1
2
Entering edit mode
12.1 years ago
Gimly_Gloin ▴ 70

What I want to do is say something like in sequence D translate region 1-100 in frame 1 from 102-150 in frame 3 from 160-180 frame -3

I would typically extract this information from a blastx table and (the idea I had) feed it into EMBOSS's transeq... However transeq seems to mess up which frame does what, and even in the help it says (for the -region function) :

"Note: you should not try to use this option with any other frame than the default, -frame=1"

I wonder if there is any software that can do this? I've looked at bioperl, and haven't seen anything that caught my eye .... ? thoughts?

blast translation bioperl • 3.8k views
ADD COMMENT
2
Entering edit mode
12.1 years ago
Neilfws 49k

I think you can use the Bioperl methods subseq, translate and revcom to achieve what you want. A brief example using this test sequence, saved as nirs.fa:

#!/usr/bin/perl -w 
use strict;
use Bio::SeqIO;

my $file  = "nirs.fa";
my $seqio = Bio::SeqIO->new(-file => "$file", -format => "fasta");
my $seqi  = $seqio->next_seq;

my $s1 = $seqi->subseq(1,100);
my $s2 = $seqi->subseq(102,150);

# apologies for chaining together these methods...
# frame 1: in Bioperl frames 1,2,3 = 0,1,2
print Bio::Seq->new(-seq => $s1)->translate(-frame => 0)->seq, "\n";
# MKAKNWLYPAIAALPLSLWLGLPHAATKAEPKA
# frame -3: we reverse complement, then frame -3 = Bioperl frame 2
print Bio::Seq->new(-seq => $s2)->revcom->translate(-frame => 2)->seq, "\n";
# RGRTQRQ*GWGWRLS

I too am somewhat confused by EMBOSS transeq (but I think it is doing what you want). Here are the equivalent commands, with results:

transeq -sequence nirs.fa -frame 1 -regions 1:100 -stdout -auto
# >2576786-2578450_1 Ralstonia eutropha H16 chromosome 2, complete sequence
# MKAKNWLYPAIAALPLSLWLGLPHAATKAEPKAX

transeq -sequence nirs.fa -frame -3 -regions 102:150 -stdout -auto             
# >2576786-2578450_6 Ralstonia eutropha H16 chromosome 2, complete sequence
# RGRTQRQ*GWGWRLSV

So transeq seems to be returning one residue more. I'm sure there's a good explanation if we think about it. Maybe the Bioperl method assumes that the last 3 bases are a terminator unless told otherwise.

ADD COMMENT
0
Entering edit mode

I get the correct frames from parsing my blastx output by printing out $hsp->hitstring or $hsp->querystring.thats not the issue what I would like is to adjust that information from the blast output so that instead of getting 5 or 6 fragments of the protein I get a single string that is correct for the protein in the several frames blastx has detected. I thought I could do this with transec, but for any given information in the blast table it shows a different result to the transec output (I've also corrected the GFF frame() used in SearchIO to show the blastX frame).Is blast skipping bases?

ADD REPLY
0
Entering edit mode

Also it doesn't allow multiple commands

ADD REPLY
0
Entering edit mode

I think your probably right about transeq though... too much "intuition"

ADD REPLY
0
Entering edit mode

I would not use the BLAST output for this task. Since BLAST tells you the frames, why not just go back to the original input sequence? As your question says: "How do I translate different regions of my sequence in different frames."

ADD REPLY

Login before adding your answer.

Traffic: 2643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6