Question: Compare two matrix
0
gravatar for Amy
2.7 years ago by
Amy0
Amy0 wrote:

I have two matrix. The first one is a genotype matrix in which:

  • Rows represent locus
  • Columns represent samples
  • Each value represents a genotype which could be either P1/P1, P2/P2, P1/P2 or NA if the genotype is not determined.

The second matrix is a matrix of counts. As the first one:

  • Rows represent locus
  • Columns represent genotypes
  • Each value represent of a count in each locus for each sample.

I would like to use the genotypic information to treat the second matrix. The aim is to replace the count value with NA when the genotype is not determined (i.e NA).

Here is an example of my two matrix:

-Genotypic Matric

CDS             BC1-III     BC1-IV      BC10-II     
LOC105031928    P1/P2       P1/P2      P1/P2    
LOC105031930    NA          NA         NA   
LOC105031931    P1/P1       P1/P1      P1/P1    
LOC105031933    P1/P1       P1/P1      P1/P1    
LOC105031934    NA          NA         NA   
LOC105031935    P1/P1       P1/P1      P1/P1    
LOC105031937    NA          NA         NA   
LOC105031938    P1/P1       P1/P1      P1/P1

-Matrix of Counts

CDS             BC1-III     BC1-IV      BC10-II     
LOC105031928    175         181.5       99
LOC105031930    10          50          0
LOC105031931    401         691         572
LOC105031933    17          69          15.75
LOC105031934    0           0           0
LOC105031935    6           0           17
LOC105031937    0           0           0
LOC105031938    408         520.1       165

What my script should give:

CDS             BC1-III     BC1-IV      BC10-II     
LOC105031928    175         181.5       99
LOC105031930    NA          NA          NA
LOC105031931    401         691         572
LOC105031933    17          69          15.75
LOC105031934    NA          NA          NA
LOC105031935    6           0           17
LOC105031937    NA          NA          NA
LOC105031938    408         520.1       165

I could read the genotypic matrix line by line and link the two matrix by their CDS as ID but i want to make sure that one value is specifi to its CDS and its sample. I am beginner to perl and by now i don't know yet how to extract from a matrix the header and row information and then assign them to one value. Thanks for your help.

PS: This is what I have done from now:

open(GENOTYPE, '<', "$matrix_geno") or die ("Cannot open $matrix_geno\n");
my %hash_Loc_line = ();
while (my $line = <GENOTYPE>)
{
    chomp $line;
    next if ($line =~ /^CDS/);
    my @columns = split (/\s+/, $line);
    my $nb_col = scalar(@columns)-1;
    my $locus = $columns[0];
    my @BC = @columns[1..$nb_col];
    foreach my $BC (@BC)
    {
        push @{$hash_Loc_line{$locus}}, $BC;
    }
}
matrix perl • 1.1k views
ADD COMMENTlink modified 2.7 years ago by Paul1.4k • written 2.7 years ago by Amy0
2

Hi Amy,

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLYlink written 2.7 years ago by WouterDeCoster43k

Have you tried anything? From your description it seems you want to simply have NA in the same locations as Table 1 and values from Table 2 otherwise. That is very easy to solve in R for example.

ADD REPLYlink written 2.7 years ago by Michael Dondrup47k

Yes I'm currently working on it but still can't find the right way to solve the problem. Unfortunately i must write the script only in perl. I'am going to add in my post what I did from now. Thanks.

ADD REPLYlink written 2.7 years ago by Amy0

So this is an assignment? Because it would be a one-liner in R, that's a pitty.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Michael Dondrup47k

Yes it is. Unfortunately :(

ADD REPLYlink written 2.7 years ago by Amy0
9
gravatar for Paul
2.7 years ago by
Paul1.4k
European Union
Paul1.4k wrote:

Hi, what about this bash solution:

paste -d '\n' matrix1 matrix2 | awk 'NR>1' | grep -v "\bP1\b" | awk '{if(! a[$1]){print; a[$1]++}}'
  
CDS             BC1-III     BC1-IV      BC10-II     
LOC105031928    175         181.5       99
LOC105031930    NA          NA         NA   
LOC105031931    401         691         572
LOC105031933    17          69          15.75
LOC105031934    NA          NA         NA   
LOC105031935    6           0           17
LOC105031937    NA          NA         NA   
LOC105031938    408         520.1       165

note: First you interleave your matrix, then you remove "double" header, then remove is there is an genotype and last you can remove duplicates lines (first allays should be "NA")

ADD COMMENTlink written 2.7 years ago by Paul1.4k

Thank you for your help. I tried it with my data and it worked well. I'll try to combine in with my perl script. :)

ADD REPLYlink written 2.7 years ago by Amy0

Your are welcome. Sorry for bash solution - I am not familiar with perl.

ADD REPLYlink written 2.7 years ago by Paul1.4k
5
gravatar for Pierre Lindenbaum
2.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum126k wrote:

bash:

paste matrix1.txt matrix1.txt | awk '/^CDS/ {print $1"\t"$2"\t"$3"\t"$4; next;} {printf("%s",$1);for(i=2;i<=4;i++) {j=i+4;printf("\t%s",($i == "NA" ? "NA" : $j ));} printf("\n");}'



CDS BC1-III BC1-IV  BC10-II
LOC105031928    175 181.5   99
LOC105031930    NA  NA  NA
LOC105031931    401 691 572
LOC105031933    17  69  15.75
LOC105031934    NA  NA  NA
LOC105031935    6   0   17
LOC105031937    NA  NA  NA
LOC105031938    408 520.1   165
ADD COMMENTlink written 2.7 years ago by Pierre Lindenbaum126k

Thank you for your answer. I tried it on my axamples and it worked well. But actually the two matrix I gave as examples are small part of my whole files. In fact I have 51 samples and 27011 rows. So do you know how I can modify this code for general situations ?

Thank you.

ADD REPLYlink written 2.7 years ago by Amy0

In fact I have 51 samples and 27011 rows.

I'm sure you can try to modify the awk script by yourself.

ADD REPLYlink written 2.7 years ago by Pierre Lindenbaum126k

Okay. Thanks anyway. :)

ADD REPLYlink written 2.7 years ago by Amy0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1477 users visited in the last hour