bioperl Bio::SeqIO join + complement weirdness
1
1
Entering edit mode
7.1 years ago

I used a simplistic script to convert GB to EMBL format and manual inspection reveals strange behaviour

#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;

if (@ARGV != 2) { die "USAGE: gb2embl.pl <GB_file> <EMBL_name>\n"; }

my $seqio = Bio::SeqIO->new('-format' => 'genbank', '-file' => "$ARGV[0]");
my $seqout = new Bio::SeqIO('-format' => 'embl', '-file' => ">$ARGV[1]");
while( my $seq = $seqio->next_seq) {
  $seqout->write_seq($seq)
}

a piece of input GB formatted

 mRNA            complement(join(280..5840,5989..6007))
                 /gene="YRF1-7"
                 /locus_tag="YPL283C"
                 /gene_synonym="YRF1"
                 /product="Y' element ATP-dependent helicase protein 1 copy
                 7"

the same piece in the embl output

FT   mRNA            join(complement(280..5840),complement(5989..6007))
FT                   /locus_tag="YPL283C"
FT                   /gene_synonym="YRF1"
FT                   /gene="YRF1-7"
FT                   /product="Y' element ATP-dependent helicase protein 1 copy
FT                   7"

I do not think that this is OK

complement( join ( -1-> -2-> ) ) = <-2- <-1-
join( complement( -1-> ) complement ( -2-> ) ) = <-1- <-2-

the order gets inverted by this logic, leading to chaotic sequence fusion

Anybody agrees and/or has comments?

Bioperl used: 1.007001

bioperl • 1.7k views
ADD COMMENT
0
Entering edit mode

you have a point, I did not consider this. thanks for your clarification!

ADD REPLY
0
Entering edit mode

Hi Stephane,

Please use ADD COMMENT to reply to earlier answers, as such this thread remains logically structured and easy to follow. Thanks!

Cheers,
Wouter

ADD REPLY
1
Entering edit mode
7.1 years ago
Michael 54k

Why do you think that the order gets inverted? It says _complement_ not _reverse complement_. If it was reverse complement, you would be right.

compl( ACCG ) + compl( TTAA) = UGGC + AAUU = UGGC AAUU = compl (ACCG TTAA) = compl (ACCG + TTAA)

It seems as if the mRNA splice string is processed somehow by bioperl, but it would be interesting to check a reverse strand gene also.

ADD COMMENT

Login before adding your answer.

Traffic: 2721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6