Question: bioperl Bio::SeqIO join + complement weirdness
1
gravatar for Stephane Plaisance
24 months ago by
Leuven area (Belgium)
Stephane Plaisance380 wrote:

I used a simplistic script to convert GB to EMBL format and manual inspection reveals strange behaviour

#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;

if (@ARGV != 2) { die "USAGE: gb2embl.pl <GB_file> <EMBL_name>\n"; }

my $seqio = Bio::SeqIO->new('-format' => 'genbank', '-file' => "$ARGV[0]");
my $seqout = new Bio::SeqIO('-format' => 'embl', '-file' => ">$ARGV[1]");
while( my $seq = $seqio->next_seq) {
  $seqout->write_seq($seq)
}

a piece of input GB formatted

 mRNA            complement(join(280..5840,5989..6007))
                 /gene="YRF1-7"
                 /locus_tag="YPL283C"
                 /gene_synonym="YRF1"
                 /product="Y' element ATP-dependent helicase protein 1 copy
                 7"

the same piece in the embl output

FT   mRNA            join(complement(280..5840),complement(5989..6007))
FT                   /locus_tag="YPL283C"
FT                   /gene_synonym="YRF1"
FT                   /gene="YRF1-7"
FT                   /product="Y' element ATP-dependent helicase protein 1 copy
FT                   7"

I do not think that this is OK

complement( join ( -1-> -2-> ) ) = <-2- <-1-
join( complement( -1-> ) complement ( -2-> ) ) = <-1- <-2-

the order gets inverted by this logic, leading to chaotic sequence fusion

Anybody agrees and/or has comments?

Bioperl used: 1.007001

bioperl • 602 views
ADD COMMENTlink modified 24 months ago • written 24 months ago by Stephane Plaisance380

you have a point, I did not consider this. thanks for your clarification!

ADD REPLYlink written 24 months ago by Stephane Plaisance380

Hi Stephane,

Please use ADD COMMENT to reply to earlier answers, as such this thread remains logically structured and easy to follow. Thanks!

Cheers,
Wouter

ADD REPLYlink written 24 months ago by WouterDeCoster37k
1
gravatar for Michael Dondrup
24 months ago by
Bergen, Norway
Michael Dondrup45k wrote:

Why do you think that the order gets inverted? It says _complement_ not _reverse complement_. If it was reverse complement, you would be right.

compl( ACCG ) + compl( TTAA) = UGGC + AAUU = UGGC AAUU = compl (ACCG TTAA) = compl (ACCG + TTAA)

It seems as if the mRNA splice string is processed somehow by bioperl, but it would be interesting to check a reverse strand gene also.

ADD COMMENTlink modified 24 months ago • written 24 months ago by Michael Dondrup45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 767 users visited in the last hour