Question: bioperl Bio::SeqIO join + complement weirdness
1
gravatar for Stephane Plaisance
2.3 years ago by
Leuven area (Belgium)
Stephane Plaisance400 wrote:

I used a simplistic script to convert GB to EMBL format and manual inspection reveals strange behaviour

#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;

if (@ARGV != 2) { die "USAGE: gb2embl.pl <GB_file> <EMBL_name>\n"; }

my $seqio = Bio::SeqIO->new('-format' => 'genbank', '-file' => "$ARGV[0]");
my $seqout = new Bio::SeqIO('-format' => 'embl', '-file' => ">$ARGV[1]");
while( my $seq = $seqio->next_seq) {
  $seqout->write_seq($seq)
}

a piece of input GB formatted

 mRNA            complement(join(280..5840,5989..6007))
                 /gene="YRF1-7"
                 /locus_tag="YPL283C"
                 /gene_synonym="YRF1"
                 /product="Y' element ATP-dependent helicase protein 1 copy
                 7"

the same piece in the embl output

FT   mRNA            join(complement(280..5840),complement(5989..6007))
FT                   /locus_tag="YPL283C"
FT                   /gene_synonym="YRF1"
FT                   /gene="YRF1-7"
FT                   /product="Y' element ATP-dependent helicase protein 1 copy
FT                   7"

I do not think that this is OK

complement( join ( -1-> -2-> ) ) = <-2- <-1-
join( complement( -1-> ) complement ( -2-> ) ) = <-1- <-2-

the order gets inverted by this logic, leading to chaotic sequence fusion

Anybody agrees and/or has comments?

Bioperl used: 1.007001

bioperl • 678 views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Stephane Plaisance400

you have a point, I did not consider this. thanks for your clarification!

ADD REPLYlink written 2.3 years ago by Stephane Plaisance400

Hi Stephane,

Please use ADD COMMENT to reply to earlier answers, as such this thread remains logically structured and easy to follow. Thanks!

Cheers,
Wouter

ADD REPLYlink written 2.3 years ago by WouterDeCoster40k
1
gravatar for Michael Dondrup
2.3 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

Why do you think that the order gets inverted? It says _complement_ not _reverse complement_. If it was reverse complement, you would be right.

compl( ACCG ) + compl( TTAA) = UGGC + AAUU = UGGC AAUU = compl (ACCG TTAA) = compl (ACCG + TTAA)

It seems as if the mRNA splice string is processed somehow by bioperl, but it would be interesting to check a reverse strand gene also.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Michael Dondrup46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 626 users visited in the last hour