Entering edit mode
                    8.6 years ago
        Stephane Plaisance
        
    
        ▴
    
    460
    I used a simplistic script to convert GB to EMBL format and manual inspection reveals strange behaviour
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
if (@ARGV != 2) { die "USAGE: gb2embl.pl <GB_file> <EMBL_name>\n"; }
my $seqio = Bio::SeqIO->new('-format' => 'genbank', '-file' => "$ARGV[0]");
my $seqout = new Bio::SeqIO('-format' => 'embl', '-file' => ">$ARGV[1]");
while( my $seq = $seqio->next_seq) {
  $seqout->write_seq($seq)
}
a piece of input GB formatted
 mRNA            complement(join(280..5840,5989..6007))
                 /gene="YRF1-7"
                 /locus_tag="YPL283C"
                 /gene_synonym="YRF1"
                 /product="Y' element ATP-dependent helicase protein 1 copy
                 7"
the same piece in the embl output
FT   mRNA            join(complement(280..5840),complement(5989..6007))
FT                   /locus_tag="YPL283C"
FT                   /gene_synonym="YRF1"
FT                   /gene="YRF1-7"
FT                   /product="Y' element ATP-dependent helicase protein 1 copy
FT                   7"
I do not think that this is OK
complement( join ( -1-> -2-> ) ) = <-2- <-1-
join( complement( -1-> ) complement ( -2-> ) ) = <-1- <-2-
the order gets inverted by this logic, leading to chaotic sequence fusion
Anybody agrees and/or has comments?
Bioperl used: 1.007001
you have a point, I did not consider this. thanks for your clarification!
Hi Stephane,
Please use
ADD COMMENTto reply to earlier answers, as such this thread remains logically structured and easy to follow. Thanks!Cheers,
Wouter