How to compare two files on the basis of Two IDs
2
0
Entering edit mode
7.6 years ago

Hi, I made a perl script to compare the files on the basis of two Ids. But could not get the success. If anyone can help in this ??

File 1:

chr7 151046672
chr7 151047369
chr3 127680920
chr3 127680920

file2 :

chr1 66953622 66953654
chr1 67200451 67200472
chr1 67200475 67200478
chr1 67058869 67058880
chr1 67058881 67058885
chr7 151046672 127680920
chr7 151047369 127680920
chr3 127680920 151046672
chr3 127680920 151047369

#!/usr/bin/perl -w

$pwd = `pwd`;
chomp($pwd);

$file=$ARGV[0];
$file1=$ARGV[1];

open(IN,$file);
while ($line=<IN>){
chomp($line);

@ary = split(/\t/,$line);
chomp($ary[0]);chomp($ary[1]);

open(SK,$file1);
while($line1=<SK>)
{
chomp($line1);
    @any = split(/\t/,$line1);
    chomp($any[0]); chomp($any[0]);chomp($any[1]);chomp($any[2]);
if (($ary[0] eq $any[0] and $ary[1] == $any[1]) or ($ary[0] eq $any[0] and $ary[1] == $any[2]))
{
    print "$line\tE\n";

}
else
{ print "$line\tM\n";}
}
}

This code is giving multiple lines with 'M' results only. Then I tried another code ..

#!/usr/bin/perl
use warnings; 
use strict;
use Data::Dumper;

my $file1 = $ARGV[0];
open($infile1,$file1);
my $file2 = $ARGV[1];
open($infile2,$file2);

my %file2_hash;

while (my $line = <$infile1>)
{
   chomp $line;  #so that output with E or M can be on same line
   next if $line =~ /^\s*$/;   #skip blank lines (a common infile goof
+)

   my ($chr, $val1, $val2) = split /\s+/,$line;
}
close $infile1;

while (my $line = <$infile2>)
{
chomp $line;   
 next if $line =~ /^\s*$/;   #skip blank lines (a common infile goof)

   my ($key, $value1, $value2) = split /\s+/, $line; # use better "nam
+es" I have
                                           # no idea of what a chr col
   $file2_hash{"$key:$value1:$value2"} = 1;

close $infile2;

   if (exists $file2_hash{"$chr:$val1:$val2"})
   {
      print "$line\tE\n";  # match exists with file 1   
}
   else
   {
   print "$line\tM\n";  # match does NOT exist with file 1

}

}

But again the same error..

What will be the possable solution ??

perl • 1.9k views
ADD COMMENT
1
Entering edit mode

What are you trying to achieve exactly? If it is compare two lists of positions to see what they have in common you have the R library GenomicRanges that has a lot of nice functions to do that:

findOverlaps(file1, file2) countOverlaps(file1, file2) etc

ADD REPLY
1
Entering edit mode
6.6 years ago

One way to do this without reinventing the wheel:

  1. Install BEDOPS.

  2. Fix your files file1.txt and file2.unsorted.bed:

    $ awk '{ print $1, $2, ($2 + 1); }' file1.txt | sort-bed - > file1.bed
    $ sort-bed file2.unsorted.bed > file2.bed
    
  3. Then run set operations:

    $ bedops -e 1 file2.bed file1.bed > elements_in_file2_that_overlap_file1.bed
    $ bedops -n 1 file2.bed file1.bed > elements_in_file2_that_do_not_overlap_file1.bed
    

    And conversely:

    $ bedops -e 1 file1.bed file2.bed > elements_in_file1_that_overlap_file2.bed
    $ bedops -n 1 file1.bed file2.bed > elements_in_file1_that_do_not_overlap_file2.bed
    

    Etc.

ADD COMMENT
0
Entering edit mode
6.6 years ago
mittu1602 ▴ 200

You can also use $ bedtools intersect -wao -a file1.bed -b file2.bed -o Output.bed

ADD COMMENT

Login before adding your answer.

Traffic: 2467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6