Question: Perl : String comparison problem
1
gravatar for bioinfo14
5.4 years ago by
bioinfo1410
bioinfo1410 wrote:

Hi all,

Im having sequences stored in two arrays @seq1 and @seq2 and i have to compare sequence @seq2 with all the indexes in the @seq1 and print the mismatched bases in each position.

these are my sequences:

@seq1 = "UAUGUACCGACCUUAUUCUCCU AUGUACCGACCUUAUUCUCCUG UGUACCGACCUUAUUCUCCUGU GUACCGACCUUAUUCUCCUGUG UACCGACCUUAUUCUCCUGUGA ACCGACCUUAUUCUCCUGUGAU CCGACCUUAUUCUCCUGUGAUC CGACCUUAUUCUCCUGUGAUCU GACCUUAUUCUCCUGUGAUCUA ACCUUAUUCUCCUGUGAUCUAC CCUUAUUCUCCUGUGAUCUACU CUUAUUCUCCUGUGAUCUACUA UUAUUCUCCUGUGAUCUACUAU UAUUCUCCUGUGAUCUACUAUA "

@seq2 = " UGGAGUGUGACAAUGGUGUUUG"

and this is my code:

foreach (0..length(@seq1))
{
 my $char = substr($seq2,$_,1);
 if($char ne substr($seq1, @_,1))
{
$result .="$char";
}
else
{
$result .="";
}
}
print $result, "\n";

 

im getting errors.. it would be grateful if anyone help me to complete this and to rectify my error. thank u

 

perl • 2.2k views
ADD COMMENTlink modified 5.4 years ago by thackl2.8k • written 5.4 years ago by bioinfo1410
1

Why is there a @_ in  if($char ne substr($seq1, @_,1)). And also, storing a single sequence string in an array (@seq1) does not make sense.  Your code has a couple of problems

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by thackl2.8k

It's not clear, but it's actually a whitespace quoted list

ADD REPLYlink written 5.4 years ago by Daniel3.8k

Yeah, I figured as much, but it took me some time ;). This also means that @landesfeind's solutions won't work...
 

ADD REPLYlink written 5.4 years ago by thackl2.8k
2

Sorry, I didn't got that, in particular because there is also a whitespace when initializing @seq2 and because both, @seq1 and @seq2, are later accessed using '$' and substr(). I assumed the white spaces to be mistakes or formatting errors.

As I was bitching around, I would gladly adjust my code in the answer to match the desired output - if one can specify how it should look like. Probably more like the following?

my @sequences = qw/UAUGUACCGACCUUAUUCUCCU
                   AUGUACCGACCUUAUUCUCCUG
                   UGUACCGACCUUAUUCUCCUGU
                   GUACCGACCUUAUUCUCCUGUG
                   UACCGACCUUAUUCUCCUGUGA
                   ACCGACCUUAUUCUCCUGUGAU
                   CCGACCUUAUUCUCCUGUGAUC
                   CGACCUUAUUCUCCUGUGAUCU
                   GACCUUAUUCUCCUGUGAUCUA
                   ACCUUAUUCUCCUGUGAUCUAC
                   CCUUAUUCUCCUGUGAUCUACU
                   CUUAUUCUCCUGUGAUCUACUA
                   UUAUUCUCCUGUGAUCUACUAU
                   UAUUCUCCUGUGAUCUACUAUA/;
my $sequence  =   'UGGAGUGUGACAAUGGUGUUUG';

foreach my $s (@sequences){
   my $result = '';
   foreach (0 .. (length($s) - 1)){
      if(substr($sequence, $_, 1) ne substr($s, $_, 1)){
         $result .= substr($sequence, $_, 1);
      }
      else {
         $result .= " ";
      }
   }
   print $result, "\n";
}

which prints

 GGAGUGU   AA GG G UUG
UG AGUGUGA AAUGGUGUU  
  G GU UGACA  GG GU UG
UGGAGUGUGAC A GGUG U  
 GGA UGUGACAAUGGUGU UG
UGGAGUG G CAA GG   UUG
UG  GUG GACAAUGGUGUU G
U GAG GUGA AAUG   U UG
UGGAG G GACAA  GUG U G
UGGAGUG GA AAUG UGU UG
UGGAG GUGA AA  G G UUG
UGG G G GACAAUGGUGUU G
 GGAGUGUGACAAUGG GU UG
 GGAG GUGACA  GGUG U G
ADD REPLYlink written 5.4 years ago by Manuel Landesfeind1.3k

thanks a lot @landesfeind

ADD REPLYlink written 5.4 years ago by bioinfo1410
3
gravatar for thackl
5.4 years ago by
thackl2.8k
MIT
thackl2.8k wrote:

I'm fully aware that the code below does not quality as an appropriate teaching example - I'm simply a fan of a little bit of bit magic when it comes to string comparison.

#!/usr/bin/env perl
use warnings;
use strict;

my $q = "UGGAGUGUGACAAUGGUGUUUG";
my @r = qw(
    UAUGUACCGACCUUAUUCUCCU
    AUGUACCGACCUUAUUCUCCUG
    UGUACCGACCUUAUUCUCCUGU
);

foreach my $r (@r) {
    my $xb = $r ^ $q; # compare
    $xb =~ tr/\0\377/\0\377/c; # bitmask with mm=1, m=0
    my $xs = $r & $xb; # mismatches
    $xs =~ tr/\0/ /; # add gaps

    print "ref: ", $r,"\n";
    print "qry: ", $q,"\n";
    print "mis: ", $xs,"\n\n";
}

produces:

ref: UAUGUACCGACCUUAUUCUCCU
qry: UGGAGUGUGACAAUGGUGUUUG
mis:  AUGUACC   CU AU C CCU

ref: AUGUACCGACCUUAUUCUCCUG
qry: UGGAGUGUGACAAUGGUGUUUG
mis: AU UACCGAC UUAUUCUCC  

ref: UGUACCGACCUUAUUCUCCUGU
qry: UGGAGUGUGACAAUGGUGUUUG
mis:   U CC ACCUU  UC CC GU
ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by thackl2.8k

Thank you so much @thackl

ADD REPLYlink written 5.4 years ago by bioinfo1410
1
gravatar for Manuel Landesfeind
5.4 years ago by
Göttingen, Germany
Manuel Landesfeind1.3k wrote:

[EDIT] Not a working solution - see comments above [/EDIT]

The following code snippet runs and gives you the desired result even though your approach is highly inefficient.

However, as @thackl mentioned, you made some horrible severe mistakes in your code! I do not want to be rude, but: read an introduction to Perl and learn some basics in programming before you proceed! Without that, your code will continue to be messy, fail, and you will waste time - yours and ours. Sorry.

#!/usr/bin/perl
use strict;
use warnings;

# Split the string sequence into an array
my @seq1 = split('', "UAUGUACCGACCUUAUUCUCCUAUGUACCGACCUUAUUCUCCUGUGU..."); 
my @seq2 = split('', "UGGAGUGUGACAAUGGUGUUUG");

# Determine the length of the shorter sequence
my $length;
if( scalar(@seq1) < scalar(@seq2) ){
   $length = scalar(@seq1);
}
else {
   $length = scalar(@seq2);
}

# Iterate the arrays and store results
my $result = '';
foreach (0 .. ($length - 1)){
   if($seq1[$_] ne $seq2[$_]){
      $result .= $seq2[$_];
   }
   else {
      $result .= " ";
   }
}
print $result, "\n";
ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Manuel Landesfeind1.3k
1

I would recommend Modern Perl, O'Reilly's Learning Perl, or Beginning Perl by Curtis Poe for learning Perl. It is better to get a good introduction from a solid book than try to learn from a 15 year old web article, which is likely to lead to more frustration and problems (it teaches the wrong way to do things right from the beginning).

ADD REPLYlink modified 5.2 years ago • written 5.4 years ago by SES8.4k

Although I support your advice on reading Perl intros, I wouldn't call the above mistakes "horrible". To me this looks like a classic "hands-on" Perl learning attempt. However, given the technical nature of the posted question and the fact that sequences in question need not to be RNA/DNA/AA for the algorithm to be applied, I believe that the OP should try to post the question on a more suited forum like PerlMonks, where I believe she/he will get a more of an "educative" advice on how to tackle the problem and what would be a more efficient solution, especially since the OP has shown the attempt to resolve it.

mxs
 

ADD REPLYlink written 5.4 years ago by mxs530

Probably, "horrible" was to rude then - I changed it. But I hold up the demand for learning PERL basics. Coming from a computer science background, I think it is crucial to understand the very basics of the programming language in use, e.g., not to mix strings with arrays (at least in PERL), check for array lengths, etc.

+1 for asking for help in the PERL programming community (e.g. PerlMonks) - even though I think they will be even more picky about the coding style ;-)

ADD REPLYlink written 5.4 years ago by Manuel Landesfeind1.3k

landesfeindThank you for ur reply.. Im new to perl and im on the process of learning so i have just tried using codes already given in some other websites for my problem but it didnt work.. thats why i posted my query here.. this is also a process of learning..  learning from experienced person will help us more than books. And ur script gives solution for the first index alone (i.e., first 22characters) i need to compare @seq2 from position 1 to n in @seq1 (i.e., from 1 to 22 then from 2 to 23 then frm 3 to 24 and so on..) 1, 2 3... are characters in @seq1.. Thank you..

ADD REPLYlink written 5.4 years ago by bioinfo1410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1022 users visited in the last hour