Question: printing duplicate data in perl using hash keys
0
gravatar for tg
6.3 years ago by
tg10
Nigeria
tg10 wrote:

hi All, i have a file that looks like this

protein_id               goterm          product

AG8P000026-PA    GO:0003824    catalytic activity
AG8P001026-PA    GO:0004181    metallocarboxypeptidase activity
AG8P001026-PA    GO:0008233    peptidase activity
AG8P001039-PA    GO:0016787    hydrolase activity
AG7P001036-PA    GO:0004182    carboxypeptidase A activity
AG7P001036-PA    GO:0004180    carboxypeptidase activity
AG7P001040-PA    GO:0022237    metallopeptidase activity
 

 

when i use perl hash it prints

 

what it prints  AG8P000026-PA    GO:0008233    peptidase activity
                       AG8P001039-PA    GO:0016787    hydrolase activity
                       AG7P001036-PA    GO:0004180    carboxypeptidase activity
                      AG7P001040-PA    GO:0022237    metallopeptidase activity

but i want it to print

AG8P000026-PA    GO:0003824    catalytic activity
AG8P001026-PA    GO:0004181    metallocarboxypeptidase activity
AG8P001026-PA    GO:0008233    peptidase activity
AG8P001039-PA    GO:0016787    hydrolase activity
AG7P001036-PA    GO:0004182    carboxypeptidase A activity
AG7P001036-PA    GO:0004180    carboxypeptidase activity
AG7P001040-PA    GO:0022237    metallopeptidase activity

please How do i modify this code?

perl code:

my $filename ="data";
open(my $INFILE,$filename)|| die("Error in  reading file $filename");  
my %infodata;
while(my $line= <$INFILE> )
{
    
        chomp $line;
    my ($id,@info)= split /\t/,$line;
          $infodata{$id} =join("\t",@info);      

}

hash duplicates perl • 3.9k views
ADD COMMENTlink modified 6.3 years ago by JC12k • written 6.3 years ago by tg10

Why do you think it  prints Ben only once?

ADD REPLYlink written 6.3 years ago by russhh5.5k

i n perl hash prints a key once and discard the rest. am writing a program that is similiar to what is posted. i have similiar output

ADD REPLYlink written 6.3 years ago by tg10

Hello tg!

We believe that this post does not fit the main topic of this site.

This isn't a perl forum.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 6.3 years ago by Devon Ryan98k

i disagree. perl is a programming language i used in bioinformatics. i have a problem that is a model of what i just posted its not fair to do this
 

ADD REPLYlink written 6.3 years ago by tg10
2

Arguing with an administrator about what's fair is about the least effective thing one can do (hint: I'm familiar with the community standards here). If you update your question to at least include a biologically relevant example then it'll be relevant to the site and I'll reopen the post (add a reply too, since I don't get notified when posts are modified).
 

ADD REPLYlink written 6.3 years ago by Devon Ryan98k

i have put up a sample of what i am working on can you please open my thread so i can get help from people that cares

ADD REPLYlink written 6.3 years ago by tg10

Yup, it's been reopened.

ADD REPLYlink written 6.3 years ago by Devon Ryan98k

Hi Devon this is  a bioinformatics forum  i guess one is free to ask question to solve a problem one has doubt on. you say we believe you are acting alone because i dont believe i have asked a silly question. my program solves a bioinformatics question. i just put a model up. that is ash thing to do and insensitive thing to do.

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by tg10

Hi tg,

I do not know the first revision of your question but I guess it didn't show the example data with GO terms, is this correct? If so, the initial descision to close the question was correct. If you have doubts about what kind of questions you are free to post but instead of complaining you should rather take the advise to improve your question. Your question is still only weakly related to bioinformatics because it is mainly about basic programming (use of hash) applied to parsing a tab separated file which by chance contains GO terms. That's good enough to keep your question open. But you still need to improve it and make it more specific, because at the moment your desired output is identical to the input. So why parse at all?

ADD REPLYlink written 6.3 years ago by Michael Dondrup48k

The original version dealt with students in different classes and their scores. Hopefully tg will update again with the actual goal, since it's also completely unclear to me why one would use a hash to simply copy print a tsv's contents.

ADD REPLYlink written 6.3 years ago by Devon Ryan98k

i want to add annoatation to my genbank file but because the protein id is the same but diff protein annation details its updating only one protein id out of the identical protein id. i want to be able to update all the protein id but it seems hash only specifies one key. That reflects in the sample data i posted.

ADD REPLYlink written 6.3 years ago by tg10

i want to add annoatation to my genbank file but because the protein id is the same but diff protein annation details its updating only one protein id out of the identical protein id. i want to be able to update all the protein id but it seems hash only specifies one key. That reflects in the sample data i posted.

ADD REPLYlink written 6.3 years ago by tg10
1
gravatar for Felix_Sim
6.3 years ago by
Felix_Sim250
United Kingdom
Felix_Sim250 wrote:

Your problem lies with adding using identical keys and assigning different values to it. Basically what you're doing is similar to saying the following.

$x = 10;
$x = 11;
$x = 12;
print $x;


which will give you x = 12, because it is the last assigned value.

To solve your problem you need to consider the following lines:

my ($id,@info)= split /\t/,$line;
$infodata{$id} =join("\t",@info)

With every run of your while loop you will assign a new value to a previously assigned key! In order to solve this you may want to consider using a different key (remember, they have to be unique to avoid the problem you're encountering). Consider this maybe:

my @info = split /\t/, $line;
$infodate{$info[1]} = join ("\t", @info);

What I have done is instead of using the protein_id as key, I've used the go term. This seems to be unique, at least for the data you've provided and would make sense. You can still print your entire line as this is the value to each goterm key.

 

 


 

ADD COMMENTlink written 6.3 years ago by Felix_Sim250
0
gravatar for JC
6.3 years ago by
JC12k
Mexico
JC12k wrote:

As others already mentioned, a perl hash will overwrite each duplicate record, instead of assign the content, you can append the value if the key exists:

my $filename ="data";
open(my $INFILE,$filename)|| die("Error in  reading file $filename");  
my %infodata;
while(my $line= <$INFILE> )
{
        #chomp $line;
        my ($id,@info)    = split /\t/,$line;
        $infodata{$id} .= $_;      

}
ADD COMMENTlink written 6.3 years ago by JC12k

Generally speaking, this is often a useful approach but it's not clear what your code is supposed to be doing. Also, I don't think it would make sense in this case because how would you separate the values? It would be easier to just join the values based on some character, or better yet, use an array to hold values for a key.

ADD REPLYlink written 6.3 years ago by SES8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1649 users visited in the last hour
_