Question

Matching Complementary Regions Of 2 Dna Sequences

0

Entering edit mode

12.4 years ago

Tonig ▴ 440

Hi , all I ask in other forum how to align two sequences in PERL, basically what I need is to find the complementary matching region between one sequence and the reverse complentary of the other one

Here is the code that I'm using

#!/usr/bin/perl
use warnings;
use strict;

sub complement {
    $_[0] =~ y/CGAT/GCTA/;
    return $_[0];
}

sub match {
    my ($s1, $s2) = @_;
    $s2 = reverse $s2;
    complement $s2;
    print "$s1\n";
    my $s2l = length $s2;
    for (my $length = $s2l; $length; $length--) { # start from the longest possible substring
            for my $start (0 .. $s2l - $length) {     # starting position of the matching substring
            my $substr = substr $s2, $start, $length;
            my $pos = index $s1, $substr;
            if ($pos + 1) {
                return ('-' x $pos) . complement "$substr" . ('-' x ($s2l - $length - $pos));
            }
        }
    }
}

print match('CGTAAATCTATCTT',
            'CATGCGTCTTTACG')
    ,"\n";

My problem is that using this I only get one result:

CGTAAATCTATCTT
GCATTT------A-

and my idea is to find the best complementary match between the two (OK, in this case this is the best one, but you have to imagine when I am dealing with sequences of hundreds and maybe thousands nucleotides), also i'm considering that the sequences will be of different length. I tried to use one approach similar to Smith and Waterman algorithm changing the matrix for complementary matrix: C-G/G-C amd T-A/A-T 1 and the rest 0

Thanks in advance

perl alignment sequence • 5.6k views

ADD COMMENT • link updated 12.4 years ago by Gustavo ▴ 530 • written 12.4 years ago by Tonig ▴ 440

0

Entering edit mode

SW will not make the gaps needed in this case unless you can penalise them less. Perhaps increase the match values from 1 to something larger and reduce the gap open and extension penalties.

ADD REPLY • link 12.4 years ago by Chris Penkett ▴ 490

score 4 · Answer 1 · 2011-12-13

You might find it computationally more efficient to use an external tool for doing the sequence comparison. For example, you could write your sequences out to separate files (if they're not already in files to start with!) and use FASTA to do the comparison.

FASTA has a variety of parameters that will make the job easy. In your case, you can specify that you want only comparisons to the reverse strand of the other sequence (-i parameter). You of course have full control of all alignment parameters like matrix, gap penalties, significance cutoffs. You can also specify which format you want to get, and it has some output formats that are extremely easy to parse.

score 0 · Answer 2 · 2011-11-29

0

Entering edit mode

12.4 years ago

Damian Kao 16k

Using Smith and Waterman and changing the matrix sounds reasonable.

Is this for homework? If it's not, why not just convert one of the two sequences into the reverse complement and use one of the many available aligners. So in your example where you are aligning:

CGTAAATCTATCTT
CATGCGTCTTTACG

Change the second sequence to its reverse complement:

CGTAAATCTATCTT
CGTAAAGACGCATG

And use [?]clustal[?] to align the sequence. You can then transform it back to the original sequence after you get a satisfactory alignment.

ADD COMMENT • link 12.4 years ago by Damian Kao 16k

0

Entering edit mode

no, it is not for my homework, i need to find complmentary regions between sequences coming from an experiment. I don't need to align the sequences, only find complementary regions between them, not to find alignments

ADD REPLY • link 12.4 years ago by Tonig ▴ 440