Question: Usage of Perl Needleman/Wunsch Module
0
gravatar for biolab
4.3 years ago by
biolab1.0k
biolab1.0k wrote:

Dear all,

I installed Needleman-Wunsch module, however have some problems to handle it.  Could anyone help to write a simple example to illustrate how to use it?  My second question is: two sequences could align in different forms, my work actually only needs the best alignment, however I'd like to know how to output the other forms.  Where is the parameter to  do this?  MY third question is: From literature I know the score normally is set as follows,  match:1, mismatch:-1, gap:-2, then what's gap extention and gap open? WHat's your favorite score scheme? Note I am aligning nucleotide sequences.

I will much appreciate your help if you answer any of my three questions. THANK YOU!!

SYNOPSIS
           use Algorithm::NeedlemanWunsch;
           sub score_sub {
               if (!@_) {
                   return -2; # gap penalty
               }
               return ($_[0] eq $_[1]) ? 1 : -1;
           }
           my $matcher = Algorithm::NeedlemanWunsch->new(\&score_sub);
           my $score = $matcher->align(
                      \@a,
                      \@b,
                      {   align     => \&on_align,
                          shift_a => \&on_shift_a,
                          shift_b => \&on_shift_b,
                          select_align => \&on_select_align
                      });

DESCRIPTION
       Sequence alignment is a way to find commonalities in two (or more)
       similar sequences or strings of some items or characters. Standard
       motivating example is the comparison of DNA sequences and their
       functional and evolutionary similarities and differences, but the
       problem has much wider applicability - for example finding the longest
       common subsequence (that is, "diff") is a special case of sequence
       alignment.
... ...

METHODS
   Standard algorithm
       new(\&score_sub [, $gap_penalty ])

       The constructor. Takes one mandatory argument, which is a coderef to a
       sub implementing the similarity matrix, plus an optional gap penalty
       argument. If the gap penalty isn't specified as a constructor argument,
       the "Algorithm::NeedlemanWunsch" object gets it by calling the scoring
       sub without arguments; apart from that case, the sub is called with 2
       arguments, which are items from the first and second sequence,
       respectively, passed to "Algorithm::NeedlemanWunsch::align". Note that
       the sub must be pure, i.e. always return the same value when called
       with the same arguments.

       align(\@a, \@b [, \%callbacks ])

       The core of the algorithm. Creates a bottom-up dynamic programming
       matrix, fills it with alignment scores and then traces back to find an
       optimal alignment, informing the application about its items by
       invoking the callbacks passed to the method.

       The first 2 arguments of "align" are array references to the aligned
       sequences, the third a hash reference with user-supplied callbacks. The
       callbacks are identified by the hash keys, which are as follows:

       align
           Aligns two sequence items. The callback is called with 2 arguments,
           which are the positions of the paired items in "\@a" and "\@b",
           respectively.

       shift_a
           Aligns an item of the first sequence with a gap in the second
           sequence. The callback is called with 1 argument, which is the
           position of the item in "\@a".

       shift_b
           Aligns a gap in the first sequence with an item of the second
           sequence. The callback is called with 1 argument, which is the
           position of the item in "\@b".

       select_align
           Called when there's more than one way to construct the optimal
           alignment, with 1 argument which is a hashref enumerating the
           possibilities. The hash may contain the following keys:

           align
               If this key exists, the optimal alignment may align two
               sequence items. The key's value is an arrayref with the
               positions of the paired items in "\@a" and "\@b", respectively.

           shift_a
               If this key exists, the optimal alignment may align an item of
               the first sequence with a gap in the second sequence. The key's
               value is the position of the item in "\@a".

           shift_b
               If this key exists, the optimal alignment may align a gap in
               the first sequence with an item of the second sequence. The
               key's value is the position of the item in "\@b".

           All keys are optional, but the hash will always have at least one.
           The callback must select one of the possibilities by returning one
           of the keys.

       All callbacks are optional. When there is just one way to make the
       optimal alignment, the "Algorithm::NeedlemanWunsch" object prefers
       calling the specific callbacks, but will call "select_align" if it's
       defined and the specific callback isn't.

       Note that "select_align" is called instead of the specific callbacks,
       not in addition to them - users defining both "select_align" and other
       callbacks should probably call the specific callback explicitly from
       their "select_align", once it decides which one to prefer.

       Also note that the passed positions move backwards, from the sequence
       ends to zero - if you're building the alignment in your callbacks, add
       items to the front.
... ... 

 

ADD COMMENTlink modified 5 months ago by jero.miranda.rod0 • written 4.3 years ago by biolab1.0k
2

I think you are asking a lot here, I would recommend limiting the scope of your post to one or two questions and try to provide some relevant information about the errors (not just copying all the documentation). What exactly are you trying to do and what problems did you have with the module? Also, what do you want as output, the alignment or the scores, or something else maybe?

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by SES8.1k

Hi SES, thanks for your advices.  I clicked ADD COMMENT or ADD REPLY, but did not work (I tried IE and Firefox), I have to use this answer box. Regarding to my original question, I mostly concern how to use this module.  Could anyone give me an example of using it? For instance, $seq1 = 'ATGCATGCATGC'  $seq2 = 'ATGGATCGAGC'.  I am new in perl, especially handle modules. Thank you very much!

ADD REPLYlink written 4.3 years ago by biolab1.0k
2
gravatar for Michael Dondrup
4.3 years ago by
Bergen, Norway
Michael Dondrup44k wrote:

Here is an example that works and is more instructive than the Synopsis section to explain how  to use the package:


#!/usr/bin/env perl

use strict;
use warnings;
use Algorithm::NeedlemanWunsch;
### scoring function:
sub score_sub {
  if (!@_) {
    return -2;            # gap penalty
  }
  ## mismatch scores -1, match +1
  return ($_[0] eq $_[1]) ? 1 : -1;
} 

# sequences come as strings normally
my $a = "ATCTATC";
my $b = "GTCTTTTTTTTATC";
my @a = split //, $a; # make an array
my @b = split //, $b;

## callbacks that print something useful
## prints an 'alignment string' in the order of the  
## recursion of the dynamic programming algorithm 
## print "-" only on match
sub on_align  { print "align", " " , $a[$_[0]], ($a[$_[0]] eq $b[$_[1]] ) ? "-" : " ", $b[$_[1]], "\n" }; 
sub on_shift_a {  print "gap  ", "" , $a[$_[0]], "\n" };
sub on_shift_b { print "gap  ", "   " , $b[$_[0]], "\n"};
### Dumb select, need to return one of the keys for alternative 
### alignments with equal score. Here, we always take the first option, but don't print it.

sub on_select_align { print "(select_align)\n"; return (keys (%{$_[0]})) [0]};
## one gets the same behaviour with not assigning on_select_align, I am too lazy to implement this callback correctly ...



my $matcher = Algorithm::NeedlemanWunsch->new(\&score_sub);
my $score = $matcher->align(
                \@a,
                \@b,
                {   align     => \&on_align,
                shift_a => \&on_shift_a,
                shift_b => \&on_shift_b,
 #               select_align => \&on_select_align
                });

print "score: $score\n";

 

 

Output:

align C-C
align T-T
align A-A
align T-T
gap     T
gap     T
gap     T
gap     T
gap     T
gap     T
gap     T
align C-C
align T-T
align A G
score: -9

 

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Michael Dondrup44k

Hi Mike, Thank you for your detailed perl script that is much helpful for my work. THANKS!

ADD REPLYlink written 4.3 years ago by biolab1.0k

Dear Mike, i am trying to modify your perl script to change output format (I would like to output two aligned sequences on a single line).  Unfortunately i had troubles. Could you please check this script?  THANKS!

#!/usr/bin/env perl

use strict;
use warnings;
use Algorithm::NeedlemanWunsch;
### scoring function:
sub score_sub {
  if (!@_) {
    return -2;            # gap penalty
  }
  ## mismatch scores -1, match +1
  return ($_[0] eq $_[1]) ? 1 : -1;
}


my $a = "ATCTATC";
my $b = "GTCTTTTTTTTATC";
my @a = split //, $a; # make an array
my @b = split //, $b;

my ($a_align, $b_align);#


## callbacks that print something useful
## prints an 'alignment string' in the order of the  
## recursion of the dynamic programming algorithm
## print "-" only on match
sub on_align  {  
$a_align .= $a[$_[0]];
$b_align .= $b[$_[1]];
};
 
sub on_shift_a { $a_align .= $a[$_[0]];};
sub on_shift_b { $b_align .= $b[$_[0]];};

### Dumb select, need to return one of the keys for alternative
### alignments with equal score. Here, we always take the first option, but don't print it.

sub on_select_align { print "(select_align)\n"; return (keys (%{$_[0]})) [0]};
## one gets the same behaviour with not assigning on_select_align, I am too lazy to implement this callback correctly ...


my $matcher = Algorithm::NeedlemanWunsch->new(\&score_sub);
my $score = $matcher->align(
                \@a,
                \@b,
                {   align     => \&on_align,
                shift_a => \&on_shift_a,
                shift_b => \&on_shift_b,
         #select_align => \&on_select_align
                });

print "$a_align\t$b_align\t";
print "$score\n";

 

My ideal output is :

CTAT-------CTA   CTATTTTTTTTCTG   -9

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by biolab1.0k
1

Hi Biolab, 

you need to change the on_shift_a and  on_shift_b subroutines:

sub on_shift_a { $a_align .= $a[$_[0]]; $b_align .= '-'};
sub on_shift_b { $b_align .= $b[$_[0]]; $a_align .= '-'};

 

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by trickytank10

I've followed this thread and it's really helpful, thanks. Can someone indicate to me how to use the module Extensions gap_open_penalty and gap_extend_penalty in order to implement affine gap penalties? (The stuff described at http://search.cpan.org/~vbar/Algorithm-NeedlemanWunsch-0.03/lib/Algorithm/NeedlemanWunsch.pm#gap_open_penalty,_gap_extend_penalty )

Thanks for any help,

Nick Goldman

ADD REPLYlink modified 8 months ago • written 8 months ago by goldman.nick0
$matcher->gap_open_penalty($gappen); $matcher->gap_extend_penalty($gapextpen);

Before calling matcher->align(); This works very well but then I tried to implement it over a loop, with $a and $b different sequences each time, it says: Use of uninitialized value $a_align in concatenation (.) or string. Same for $b_align. The callback with print shows the algorithm is working. Anyone has any idea as to how to re-initialize $a_align and $b_align?

ADD REPLYlink written 5 months ago by jero.miranda.rod0

I found this on a russian forum:

my (@align1, @align2);
my $result = $matcher->align($arr1, $arr2,
  {
   align   => sub {unshift @align1, $arr1->[shift]; unshift @align2, $arr2->[shift]},
   shift_a => sub {unshift @align1, $arr1->[shift]; unshift @align2,            '.'},
   shift_b => sub {unshift @align1,            '.'; unshift @align2, $arr1->[shift]},
  });

$arr1 and $arr2 are references to the array of sequences that you would get as

my $arr1 = [split //, $a];

Worked for me.

ADD REPLYlink written 5 months ago by jero.miranda.rod0
0
gravatar for jero.miranda.rod
5 months ago by
jero.miranda.rod0 wrote:

$matcher->gap_open_penalty(-2); $matcher->gap_extend_penalty(-0.1); Before calling matcher->align(); This works very well but then I tried to implement it over a loop, with $a and $b different sequences each time, it says: Use of uninitialized value $a_align in concatenation (.) or string. Same for $b_align. The callback with print shows the algorithm is working. Anyone has any idea as to how to re-initialize $a_align and $b_align?

ADD COMMENTlink written 5 months ago by jero.miranda.rod0

I found this on a russian forum:

my (@align1, @align2);
my $result = $matcher->align($arr1, $arr2,
  {
   align   => sub {unshift @align1, $arr1->[shift]; unshift @align2, $arr2->[shift]},
   shift_a => sub {unshift @align1, $arr1->[shift]; unshift @align2,            '.'},
   shift_b => sub {unshift @align1,            '.'; unshift @align2, $arr1->[shift]},
  });

$arr1 and $arr2 are references to the array of sequences that you would get as

my $arr1 = [split //, $a];

Worked for me.

ADD REPLYlink written 5 months ago by jero.miranda.rod0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1056 users visited in the last hour