Question: Matching Complementary Regions Of 2 Dna Sequences
0
8.5 years ago by
Tonig440
Tonig440 wrote:

Hi , all I ask in other forum how to align two sequences in PERL, basically what I need is to find the complementary matching region between one sequence and the reverse complentary of the other one

Here is the code that I'm using

``````#!/usr/bin/perl
use warnings;
use strict;

sub complement {
\$_[0] =~ y/CGAT/GCTA/;
return \$_[0];
}

sub match {
my (\$s1, \$s2) = @_;
\$s2 = reverse \$s2;
complement \$s2;
print "\$s1\n";
my \$s2l = length \$s2;
for (my \$length = \$s2l; \$length; \$length--) { # start from the longest possible substring
for my \$start (0 .. \$s2l - \$length) {     # starting position of the matching substring
my \$substr = substr \$s2, \$start, \$length;
my \$pos = index \$s1, \$substr;
if (\$pos + 1) {
return ('-' x \$pos) . complement "\$substr" . ('-' x (\$s2l - \$length - \$pos));
}
}
}
}

print match('CGTAAATCTATCTT',
'CATGCGTCTTTACG')
,"\n";
``````

My problem is that using this I only get one result:

``````CGTAAATCTATCTT
GCATTT------A-
``````

and my idea is to find the best complementary match between the two (OK, in this case this is the best one, but you have to imagine when I am dealing with sequences of hundreds and maybe thousands nucleotides), also i'm considering that the sequences will be of different length. I tried to use one approach similar to Smith and Waterman algorithm changing the matrix for complementary matrix: C-G/G-C amd T-A/A-T 1 and the rest 0

perl alignment sequence • 4.0k views
modified 8.5 years ago by Gustavo530 • written 8.5 years ago by Tonig440

SW will not make the gaps needed in this case unless you can penalise them less. Perhaps increase the match values from 1 to something larger and reduce the gap open and extension penalties.

4
8.5 years ago by
Gustavo530
Seattle
Gustavo530 wrote:

You might find it computationally more efficient to use an external tool for doing the sequence comparison. For example, you could write your sequences out to separate files (if they're not already in files to start with!) and use FASTA to do the comparison.

FASTA has a variety of parameters that will make the job easy. In your case, you can specify that you want only comparisons to the reverse strand of the other sequence (-i parameter). You of course have full control of all alignment parameters like matrix, gap penalties, significance cutoffs. You can also specify which format you want to get, and it has some output formats that are extremely easy to parse.

Thanks Gustavo, I'll try it I finally did it using BLAT, but it will be nice trying to compare with FASTA

0
8.5 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

Using Smith and Waterman and changing the matrix sounds reasonable.

Is this for homework? If it's not, why not just convert one of the two sequences into the reverse complement and use one of the many available aligners. So in your example where you are aligning:

``````CGTAAATCTATCTT
CATGCGTCTTTACG
``````

Change the second sequence to its reverse complement:

``````CGTAAATCTATCTT
CGTAAAGACGCATG
``````

And use [?]clustal[?] to align the sequence. You can then transform it back to the original sequence after you get a satisfactory alignment.