Question

Randomize CDS While Maintaining Amino Acid Sequence

0

Entering edit mode

9.2 years ago

sheinsch ▴ 10

I am trying to remove promoter recognition sites and transcription factor binding sites from several coding sequences. Is there a tool that will randomize the nucleotide sequence while maintaining the amino acid sequence?

EDIT:

I will be expressing the proteins under a variety of promoters. What I am trying to do now is remove any sites within the CDS that could potentially bind transcription factors or RNA polymerase.

gene • 1.9k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by sheinsch ▴ 10

0

Entering edit mode

Why do you want to randomize the nucleotide sequences in the protein coding region when you want to remove the promoter recognition sites and TF binding sites which is usually located up-stream of the protein coding region?

ADD REPLY • link 9.2 years ago by Sam ★ 4.7k

Ram · Answer 1 · 2015-01-21

I played around with perl and this should give you a randomized sequence each time:

#!/usr/bin/perl
use strict;
use warnings;
my $num_args = $#ARGV + 1;
if ($num_args != 2) {
    print "\nUsage: aminoRand.pl <Codon Table File> <nucleotide sequence>\n";
    exit;
}

open CODON, $ARGV[0] or die $!;

my %codon = ();
my %translate = ();
while (<CODON>) {
  chomp;
  if ( /^\s*$/ ) { 
  }else{
      my @list = split( /\s+/, $_);
      my $key = $list[0];
      my @codes = @list;
      @codes = splice @codes, 1, @codes;
      $codon{$key} = \@codes;
      for(my $i = 1; $i < $#list+1; ++$i){
        $translate{$list[$i]} = $key;
      }
  }
}
close(CODON);
chomp($ARGV[1]);
my $length = length($ARGV[1]);
for(my $i = 0; $i < $length; $i=$i+3){
    my $current = substr $ARGV[1], $i, 3;
    if(exists $translate{$current}){
        my $newKey=$translate{$current};
        if(exists $codon{$newKey}){
        my @possible = @{$codon{$newKey}};
        print($possible[rand @possible]);
        }
        else{
            print "Cannot find in codon: $newKey\n";
        }
    }
    else{
        print "Can't find: $current\n";
    }
}
print("\n");

You will need to provide a codon file of the following format:

I  ATT  ATC  ATA
L  CTT  CTC  CTA  CTG  TTA  TTG

and then the sequence. Then it will randomly generate a sequence that will produce the same amino acid sequence but different neucleotide