Question: Protein Count Matrix In Perl
0
gravatar for loicatraile
6.3 years ago by
loicatraile40
Germany
loicatraile40 wrote:

Hi!, I´m trying to create a 20x20 Count matrix in perl for a Protein alignment. The matrix should be simmetric but im not getting good results. Does anyone knows how to do this right? i have wrote this so far:

Here´s my Protein alignment:

>EOG50000C|AMELL_prerelease2|GB14042-PA
MA--------------------------------------LIHRVNINFGFVLNTFYCKC
IVNIKRCK----TEYNIKSQTNVKYGENITFGDMLKFRQTQAKKSVLIGV--PENYIYDM
EKYCLKHVNIKKILFYNTET----NRCFVLMELMTEKDVEIFRNMASCK----------I
NSNAHSVTPLFLY-----------------KVRNSSYNKPMTKKSIMFKPPNFDQIKES-
LKNKN--TISEQMIILHHLLKITDLDIRLRFFTAEQISYYLSRLFMNVSVIPFGSSVNGF
GQIGCDLDLLY-------------------------------RNEQKEFLDAISTIMKIC
IPGIDNIKKILEARVPIIKFFNQTTNMHCDLSSTNVIALHMSELLYTYGELDWRVKPLIC
TIRKWARNTNITKDI-SGQWITNFSLTLLILFYLQIK-----NILPSLKVIKYHIKN---
S-------------------------RRSWSND-----WKNSINYNNDSLHNLLYGFFEY
YSIFDFKMQAICIKEGKPRVKKD--SSPLYIYNPFDESLNVSKNITIFELTRLIDHFRKA
LQILIETS----------------EKDAIIKLIKINSISSNSIEMDDCK----DIEKSTE
IDNIEKDSDTFNQKES--------LNESINKINVNVLETMEKNIIK--------------
------------------------------------------------------------
---------------------------------------
>EOG50000C|PHUMA_1.2|PHUM400420-PA
M-----------------------------------------------------------
------------------------------------------------------------
---------------YKFET---NSMKYVLLEMENGEQLSNLINHSKSHGGGGGGGKRGK
FSILNSLGGHFGYFNDSIVYDHEKIIPTTTVMENYDYYYSNVNNND--DNSKTNSIRKL-
LKEYKTLDLDNQIRLLHDSYKLNDLGYRLRYMMWKQIQNLVNVLLPTHSVYPFGSAVNGL
GDVTSDLDLVVLRDDSTS--------------FRWHKNPGVDINSVQNDLSLLTNVMTKT
SVGCAKVVFIKQAKVPIIKYKHSFTGVECDVSMHQTEALKMSQILFALSNFDPRIKPLIF
FIKIWARELRLTREQ-PGPTITNFSLILLTIFFLQQKGSSHTEILPPFDLFPLRDTD---
ANDDVMT-MENTDEKLECILNNFKCHVTSSSSSSTSSPTSSPSAPNGKSTVELLLLFYKF
YSVFNFNSYGCCLNTGQIIRKNNFKNYGMYIKNPFCSIYNVSKNVSLSEVNKFQFYCQIA
ATLLEESETTISRNSTTRTAVVDESKFLLNFLNYNSYANELMAKKLSNNNNINSNVKFAD
LKSIF-------------------------------------------------------
------------------------------------------------------------
---------------------------------------

Here´s my Code:

  use warnings;

    use Data::Dumper;

    my $datei = shift @ARGV;


    $datei = '/Users/Maxi/Desktop/LernenPerl/alignmentpro.txt';

    open(my $fastd,'<', $datei) or die "die datei $datei wurde nicht geöffnet: $!\n";


    my %sequences = ();
    my $header = '';
    my $reverse='';
    my $dnaheaders= 0;
    my $length=0;

    while (my $line = <$fastd>) {
        chomp $line;
        if ($line =~ /^>/) {
            $header = $line;
            $header =~ s/>//g;

        }
        else {
            $sequences{$header} .= $line
        }
    }


    my $seq1= $sequences{'EOG50000C|PHUMA_1.2|PHUM400420-PA'};
    my $seq2= $sequences{'EOG50000C|AMELL_prerelease2|GB14042-PA'};


    my $matrix={};

    LINE: for (my $i=0; $i< length $seq1; $i++){
        my ($substr1,$substr2) = (substr($seq1,$i,1),substr($seq2, $i,1));
        next LINE if ("$substr1$substr2" =~ /-/);
    if ($substr1 eq $substr2){
            $matrix->{$substr1}->{$substr2}++;

        }elsif ($substr1 ne $substr2){
            $matrix->{$substr1}->{$substr2} += 1/2;
            $matrix->{$substr2}->{$substr1} += 1/2;
        }

    }



    print "     A     C     D     E     F     G     H     I     K     L     M     N     P     Q     R     S     T     V     W     Y\n";
  foreach my $key (sort keys %$matrix) {
    print $key." ";
    foreach my $key2 (sort keys %{$matrix->{$key}}) {
        printf("%3d ",$matrix->{$key}->{$key2});
    }
    print "\n"x2;
}


    print Dumper $matrix;

the matrix should see like this (not for this alignment.. just an example):

 4372  121  113  162   96  140  324  387   40  136  212  219   72   50  213  596  300    6   39  428
  121 2748  100   70   20  188  170   88   69   48   85  679   38   26   46  129  110   10   30   59
  113  100 1727  322   16   92  166  190   89   37   42  209   17   18   38  207  160    4   40   45
  162   70  322 3092    8  127  732  154   68   18   36  176   11   14   67  204  120    4   24   28
   96   20   16    8  680    6    6   20    6   30   30   12   10   22    6   61   54    7   16   83
  140  188   92  127    6 1367  346   58   70   35  100  310   52   18   48  128  103    6   30   66
  324  170  166  732    6  346 3378  138   63   40   68  411   29   13   93  212  192    6   28  104
  387   88  190  154   20   58  138 5788   30   19   36  136   14   17   46  230   78    8   12   40
   40   69   89   68    6   70   63   30 1059   14   39   92   16   54   34   48   40   13   96   27
  136   48   37   18   30   35   40   19   14 3217  858   72  198  144   24   55  156   12   37 1300
  212   85   42   36   30  100   68   36   39  858 5061  124  410  331   62  104  133   20   92  593
  219  679  209  176   12  310  411  136   92   72  124 3054   48   23   84  216  172    4   32   84
   72   38   17   11   10   52   29   14   16  198  410   48  923   82   12   38   64    5   24  134
   50   26   18   14   22   18   13   17   54  144  331   23   82 2191   14   38   39   59  367  101
  213   46   38   67    6   48   93   46   34   24   62   84   12   14 2757  147   66    2   10   54
  596  129  207  204   61  128  212  230   48   55  104  216   38   38  147 2326  490   13   28  100
  300  110  160  120   54  103  192   78   40  156  133  172   64   39   66  490 2808    4   31  299
    6   10    4    4    7    6    6    8   13   12   20    4    5   59    2   13    4  582   48   14
   39   30   40   24   16   30   28   12   96   37   92   32   24  367   10   28   31   48 1612   42
  428   59   45   28   83   66  104   40   27 1300  593   84  134  101   54  100  299   14   42 3784
perl bioinformatics matrix • 1.7k views
ADD COMMENTlink modified 6.3 years ago by Jordan1.1k • written 6.3 years ago by loicatraile40
1
gravatar for Jordan
6.3 years ago by
Jordan1.1k
Pittsburgh
Jordan1.1k wrote:

I think you have to initialize $matrix as 20x20 matrix with values in it as zero first. This way the zeros will make up for the missing holes. For e.g., there might not be any substitution from A->H. If you do not initialize it to be zero, then the cell will move one position to the left. The A,H will hold the value of A,L instead. (I'm telling this in reference to your print statement).

Do you understand what I mean?

ADD COMMENTlink written 6.3 years ago by Jordan1.1k

hi! thank you for your answer... yes i think you are right but how do i replace all the zeros with the numbers i need? i mean if i initialize the matrix like this:

 my $matrix ={};

for (my $i=0; $i<20;$i++){
for (my $j=0; $j<20;$j++){
        $matrix->{$i}->{$j}= 0;
    }
}

and i introduce only that code in my code i get a 20x20 matrix with zeros + what i had before

thank you for your help!

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by loicatraile40
1

One way to work around this is, instead of using hashes you could use an array (@matrix). And instead of having letters A - V, you could use numbers from 0-19. And use a hash called %aminos where each key is an amino acid and value is a number. For e.g., $aminos{"A"} = 0, $aminos{"C"} = 1, etc.

So your if statement in LINE: will be something like $matrix->[$aminos{$substr1}]->[$aminos{$substr2}]++;

And you can print using the double for loop like you used in the above comment. This way you can only deal with numbers and you can still stick with your code logic.

Did you see what I did there?

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by Jordan1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1662 users visited in the last hour