Question: Matching Complementary Regions Of 2 Dna Sequences
0
gravatar for Tonig
7.5 years ago by
Tonig430
Tonig430 wrote:

Hi , all I ask in other forum how to align two sequences in PERL, basically what I need is to find the complementary matching region between one sequence and the reverse complentary of the other one

Here is the code that I'm using

#!/usr/bin/perl
use warnings;
use strict;

sub complement {
    $_[0] =~ y/CGAT/GCTA/;
    return $_[0];
}

sub match {
    my ($s1, $s2) = @_;
    $s2 = reverse $s2;
    complement $s2;
    print "$s1\n";
    my $s2l = length $s2;
    for (my $length = $s2l; $length; $length--) { # start from the longest possible substring
            for my $start (0 .. $s2l - $length) {     # starting position of the matching substring
            my $substr = substr $s2, $start, $length;
            my $pos = index $s1, $substr;
            if ($pos + 1) {
                return ('-' x $pos) . complement "$substr" . ('-' x ($s2l - $length - $pos));
            }
        }
    }
}

print match('CGTAAATCTATCTT',
            'CATGCGTCTTTACG')
    ,"\n";

My problem is that using this I only get one result:

CGTAAATCTATCTT
GCATTT------A-

and my idea is to find the best complementary match between the two (OK, in this case this is the best one, but you have to imagine when I am dealing with sequences of hundreds and maybe thousands nucleotides), also i'm considering that the sequences will be of different length. I tried to use one approach similar to Smith and Waterman algorithm changing the matrix for complementary matrix: C-G/G-C amd T-A/A-T 1 and the rest 0

Thanks in advance

perl alignment sequence • 3.6k views
ADD COMMENTlink modified 7.4 years ago by Gustavo530 • written 7.5 years ago by Tonig430

SW will not make the gaps needed in this case unless you can penalise them less. Perhaps increase the match values from 1 to something larger and reduce the gap open and extension penalties.

ADD REPLYlink written 7.5 years ago by Chris Penkett480
4
gravatar for Gustavo
7.4 years ago by
Gustavo530
Seattle
Gustavo530 wrote:

You might find it computationally more efficient to use an external tool for doing the sequence comparison. For example, you could write your sequences out to separate files (if they're not already in files to start with!) and use FASTA to do the comparison.

FASTA has a variety of parameters that will make the job easy. In your case, you can specify that you want only comparisons to the reverse strand of the other sequence (-i parameter). You of course have full control of all alignment parameters like matrix, gap penalties, significance cutoffs. You can also specify which format you want to get, and it has some output formats that are extremely easy to parse.

ADD COMMENTlink written 7.4 years ago by Gustavo530

Thanks Gustavo, I'll try it I finally did it using BLAT, but it will be nice trying to compare with FASTA

ADD REPLYlink written 7.4 years ago by Tonig430
0
gravatar for Damian Kao
7.5 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

Using Smith and Waterman and changing the matrix sounds reasonable.

Is this for homework? If it's not, why not just convert one of the two sequences into the reverse complement and use one of the many available aligners. So in your example where you are aligning:

CGTAAATCTATCTT
CATGCGTCTTTACG

Change the second sequence to its reverse complement:

CGTAAATCTATCTT
CGTAAAGACGCATG

And use [?]clustal[?] to align the sequence. You can then transform it back to the original sequence after you get a satisfactory alignment.

ADD COMMENTlink written 7.5 years ago by Damian Kao15k

no, it is not for my homework, i need to find complmentary regions between sequences coming from an experiment. I don't need to align the sequences, only find complementary regions between them, not to find alignments

ADD REPLYlink written 7.5 years ago by Tonig430
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1043 users visited in the last hour