Question: How to compare two files on the basis of Two IDs
0
gravatar for Genomebiology
3.1 years ago by
USA
Genomebiology0 wrote:

Hi, I made a perl script to compare the files on the basis of two Ids. But could not get the success. If anyone can help in this ??

File 1:

chr7 151046672
chr7 151047369
chr3 127680920
chr3 127680920

file2 :

chr1 66953622 66953654
chr1 67200451 67200472
chr1 67200475 67200478
chr1 67058869 67058880
chr1 67058881 67058885
chr7 151046672 127680920
chr7 151047369 127680920
chr3 127680920 151046672
chr3 127680920 151047369

#!/usr/bin/perl -w

$pwd = `pwd`;
chomp($pwd);

$file=$ARGV[0];
$file1=$ARGV[1];

open(IN,$file);
while ($line=<IN>){
chomp($line);

@ary = split(/\t/,$line);
chomp($ary[0]);chomp($ary[1]);

open(SK,$file1);
while($line1=<SK>)
{
chomp($line1);
    @any = split(/\t/,$line1);
    chomp($any[0]); chomp($any[0]);chomp($any[1]);chomp($any[2]);
if (($ary[0] eq $any[0] and $ary[1] == $any[1]) or ($ary[0] eq $any[0] and $ary[1] == $any[2]))
{
    print "$line\tE\n";

}
else
{ print "$line\tM\n";}
}
}

This code is giving multiple lines with 'M' results only. Then I tried another code ..

#!/usr/bin/perl
use warnings; 
use strict;
use Data::Dumper;

my $file1 = $ARGV[0];
open($infile1,$file1);
my $file2 = $ARGV[1];
open($infile2,$file2);

my %file2_hash;

while (my $line = <$infile1>)
{
   chomp $line;  #so that output with E or M can be on same line
   next if $line =~ /^\s*$/;   #skip blank lines (a common infile goof
+)

   my ($chr, $val1, $val2) = split /\s+/,$line;
}
close $infile1;

while (my $line = <$infile2>)
{
chomp $line;   
 next if $line =~ /^\s*$/;   #skip blank lines (a common infile goof)

   my ($key, $value1, $value2) = split /\s+/, $line; # use better "nam
+es" I have
                                           # no idea of what a chr col
   $file2_hash{"$key:$value1:$value2"} = 1;

close $infile2;

   if (exists $file2_hash{"$chr:$val1:$val2"})
   {
      print "$line\tE\n";  # match exists with file 1   
}
   else
   {
   print "$line\tM\n";  # match does NOT exist with file 1

}

}

But again the same error..

What will be the possable solution ??

perl • 1.1k views
ADD COMMENTlink modified 2.2 years ago by mittu1602170 • written 3.1 years ago by Genomebiology0
1

What are you trying to achieve exactly? If it is compare two lists of positions to see what they have in common you have the R library GenomicRanges that has a lot of nice functions to do that:

findOverlaps(file1, file2) countOverlaps(file1, file2) etc

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by VHahaut1.1k
1
gravatar for Alex Reynolds
2.2 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

One way to do this without reinventing the wheel:

  1. Install BEDOPS.

  2. Fix your files file1.txt and file2.unsorted.bed:

    $ awk '{ print $1, $2, ($2 + 1); }' file1.txt | sort-bed - > file1.bed
    $ sort-bed file2.unsorted.bed > file2.bed
    
  3. Then run set operations:

    $ bedops -e 1 file2.bed file1.bed > elements_in_file2_that_overlap_file1.bed
    $ bedops -n 1 file2.bed file1.bed > elements_in_file2_that_do_not_overlap_file1.bed
    

    And conversely:

    $ bedops -e 1 file1.bed file2.bed > elements_in_file1_that_overlap_file2.bed
    $ bedops -n 1 file1.bed file2.bed > elements_in_file1_that_do_not_overlap_file2.bed
    

    Etc.

ADD COMMENTlink written 2.2 years ago by Alex Reynolds29k
0
gravatar for mittu1602
2.2 years ago by
mittu1602170
India
mittu1602170 wrote:

You can also use $ bedtools intersect -wao -a file1.bed -b file2.bed -o Output.bed

ADD COMMENTlink written 2.2 years ago by mittu1602170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1696 users visited in the last hour