Compare pairs of key/value of perl hash tables
4
1
Entering edit mode
6.9 years ago
Amy • 0

Hi Guys,

Does anyone know how to compare pairs of key/value in two hashtables ? I'm currently working with two tab files. Each one contains list of SNPs with their location and position as information. I need to compare both files and get the common location/position between two files. Input files looks like :

-File1:

LOCATION             POSITION 
LOC105032014         221                 
LOC105032014         222                 
LOC105032014         371                 
LOC105032014         434                 
LOC105032014         1271

-File2:

LOCATION             POSITION          
LOC105032014         193                 
LOC105032014         371                 
LOC105032014         1097                
LOC105032014         1102                
LOC105032014         1111                
LOC105032014         1119                               
LOC105032014         1271

My output should give something like:

LOCATION             POSITION 
LOC105032014         1271                
LOC105032014         371

Any help will be welcome. Thanks !

Perl SNPs Hashes • 5.1k views
ADD COMMENT
1
Entering edit mode

It seems your key is both, the location and the position. I would use a hash like $hash{"$location:$position"} = 1.

ADD REPLY
1
Entering edit mode

Not so much an issue here, probably, but watch out for duplicate keys when you make a custom key like this.

ADD REPLY
3
Entering edit mode
6.9 years ago

You're looking for lines that are common between two files. Ther are plenty of solutions on the net already. You could use comm command or if you want to use perl and both files fit in memory, you can read then into arrays and use List::Compare.

ADD COMMENT
0
Entering edit mode

list::compare is great

ADD REPLY
3
Entering edit mode
6.9 years ago
EagleEye 7.5k

I would rather prefer to solve it in easy and quicker way,

grep -w -Ff File2.txt File1.txt > commonFile1File2.txt

Sorry if I understood your question wrong.

ADD COMMENT
0
Entering edit mode

You got it right. And it worked as well as the perl script. Such a magic trick. Thank you. :)

ADD REPLY
2
Entering edit mode
6.9 years ago

I wrote a script that suggests one approach:

Usage:

$ intersect.pl --fileA="A.txt" --fileB="B.txt" > answer.txt

This uses Perl's experimental "smartmatch" feature, which can give an annoying warning message that can be discarded by directing standard error to /dev/null. If you're not comfortable using experimental features, there is a List::Util library that offers limited set-style operations.

ADD COMMENT
0
Entering edit mode

Just because you mention sets, I'll also mention the Set::Scalar module.

ADD REPLY
0
Entering edit mode

It exactly gave me what I wanted. Thank you so much for your help. :)

ADD REPLY
1
Entering edit mode
6.9 years ago
mastal511 ★ 2.1k

You could iterate through one hash, and for each gene ID (location) key in the hash, check if it is present in the other hash, and whether the values are the same, and print out the Location and Position when you find matching values. But you would have to make hashes of arrays, because hash keys in perl have to be unique values, so each gene ID could only be present once as a hash key. This may not be the most efficient way to go about this.

ADD COMMENT

Login before adding your answer.

Traffic: 1860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6