Question: How to compare two files on the basis of Two IDs
gravatar for Genomebiology
3.7 years ago by
Genomebiology0 wrote:

Hi, I made a perl script to compare the files on the basis of two Ids. But could not get the success. If anyone can help in this ??

File 1:

chr7 151046672
chr7 151047369
chr3 127680920
chr3 127680920

file2 :

chr1 66953622 66953654
chr1 67200451 67200472
chr1 67200475 67200478
chr1 67058869 67058880
chr1 67058881 67058885
chr7 151046672 127680920
chr7 151047369 127680920
chr3 127680920 151046672
chr3 127680920 151047369

#!/usr/bin/perl -w

$pwd = `pwd`;


while ($line=<IN>){

@ary = split(/\t/,$line);

    @any = split(/\t/,$line1);
    chomp($any[0]); chomp($any[0]);chomp($any[1]);chomp($any[2]);
if (($ary[0] eq $any[0] and $ary[1] == $any[1]) or ($ary[0] eq $any[0] and $ary[1] == $any[2]))
    print "$line\tE\n";

{ print "$line\tM\n";}

This code is giving multiple lines with 'M' results only. Then I tried another code ..

use warnings; 
use strict;
use Data::Dumper;

my $file1 = $ARGV[0];
my $file2 = $ARGV[1];

my %file2_hash;

while (my $line = <$infile1>)
   chomp $line;  #so that output with E or M can be on same line
   next if $line =~ /^\s*$/;   #skip blank lines (a common infile goof

   my ($chr, $val1, $val2) = split /\s+/,$line;
close $infile1;

while (my $line = <$infile2>)
chomp $line;   
 next if $line =~ /^\s*$/;   #skip blank lines (a common infile goof)

   my ($key, $value1, $value2) = split /\s+/, $line; # use better "nam
+es" I have
                                           # no idea of what a chr col
   $file2_hash{"$key:$value1:$value2"} = 1;

close $infile2;

   if (exists $file2_hash{"$chr:$val1:$val2"})
      print "$line\tE\n";  # match exists with file 1   
   print "$line\tM\n";  # match does NOT exist with file 1



But again the same error..

What will be the possable solution ??

perl • 1.2k views
ADD COMMENTlink modified 2.7 years ago by mittu1602180 • written 3.7 years ago by Genomebiology0

What are you trying to achieve exactly? If it is compare two lists of positions to see what they have in common you have the R library GenomicRanges that has a lot of nice functions to do that:

findOverlaps(file1, file2) countOverlaps(file1, file2) etc

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by VHahaut1.1k
gravatar for Alex Reynolds
2.7 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

One way to do this without reinventing the wheel:

  1. Install BEDOPS.

  2. Fix your files file1.txt and file2.unsorted.bed:

    $ awk '{ print $1, $2, ($2 + 1); }' file1.txt | sort-bed - > file1.bed
    $ sort-bed file2.unsorted.bed > file2.bed
  3. Then run set operations:

    $ bedops -e 1 file2.bed file1.bed > elements_in_file2_that_overlap_file1.bed
    $ bedops -n 1 file2.bed file1.bed > elements_in_file2_that_do_not_overlap_file1.bed

    And conversely:

    $ bedops -e 1 file1.bed file2.bed > elements_in_file1_that_overlap_file2.bed
    $ bedops -n 1 file1.bed file2.bed > elements_in_file1_that_do_not_overlap_file2.bed


ADD COMMENTlink written 2.7 years ago by Alex Reynolds30k
gravatar for mittu1602
2.7 years ago by
mittu1602180 wrote:

You can also use $ bedtools intersect -wao -a file1.bed -b file2.bed -o Output.bed

ADD COMMENTlink written 2.7 years ago by mittu1602180
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1686 users visited in the last hour