Question

How Can I Filter Multiple Data From Multiple Files And Printing All Data Into One File

0

Entering edit mode

10.8 years ago

Raghav ▴ 100

Dear all,

I have three input files, named, gene_id.txt, data_set1.txt and data_Set2.txt

gene_id has entries like:

gnl|UG|Ta#S12874103
gnl|UG|Ta#S12880111
gnl|UG|Ta#S12885252
gnl|UG|Ta#S12916414
gnl|UG|Ta#S12886521
gnl|UG|Ta#S12959389
gnl|UG|Ta#S12889059
gnl|UG|Ta#S12892897
gnl|UG|Ta#S12892904

data_Set1 has [tab separated]:

454    gnl|UG|Ta#S12874103
35    gnl|UG|Ta#S12916414
200    gnl|UG|Ta#S12917670
5    gnl|UG|Ta#S12959389

data_set2 has:

34    gnl|UG|Ta#S12935716
21    gnl|UG|Ta#S12959389

I expect output like:

gene_ids                           data_set1                  dataset2
gnl|UG|Ta#S12874103                454                        0
gnl|UG|Ta#S12880111                0                          0
gnl|UG|Ta#S12885252                0                          0
gnl|UG|Ta#S12916414                35
gnl|UG|Ta#S12886521                0                          0
gnl|UG|Ta#S12959389                5                          21

all gene ids ......

I am looking for an program which check data_Set1 and data_set2 entries in gene_id list and if any match found, print number [which are showing in data_set1 and data_set2] corresponding gene_ids list.

I have written a perl script in very crude way which can handle only data_set1.txt and gene_ids.txt at a time. I am even unable to print it new output file, here is my program

$/=undef;
$aa=@ARGV[0];
$bb=@ARGV[1];
#$cc=@ARGV[2];
open(a1,"$aa");
open(b1,"$bb");
#open(c1,"$cc");

$x=<a1>;
$y=<b1>;
#$z=<c1>;
#print "$x \n";
#print "$y \n";
#print "$z \n";

@gene_name = split (/\n/,$x);
  #print "query gene ids =\n @gene_name[6]\n";

@data_set1=split (/\n/,$y);
 #print"data set one = \n $data_set1[4]\n";

#@data_set2=split(/\n/,$z);
#print "dataset two = \n @data_set2\n";

$flag =0;

for ($i=0;$i<=$#gene_name;$i++)
            {
            #print "$gene_name[1]\n";    

           for ($j=0;$j<=$#data_set1;$j++)
            {
               ($n1,$n2)=split (/\t/,$data_set1[$j]);
                    #print"$n1\n";
                    #print"$n2\n";
                        #print "$data_set1[$j] \n";
                                 if ( $n2 eq $gene_name[$i])
                                    {
                                    #print "$gene_name[$j]\t$n1\n";
                                 $flag=1;

                                 print "$gene_name[$i]\t$n1\n";                
                            #$out=@ARGV[2]; 
                        #open(ff12,">>$out");#appending 
                    #print ff12 "gene_name[$i]\t,$n1";
                                    }
                    }

    if ( $flag eq "0" )
               {
                 print "$gene_name[$i]\n";
                }
    $flag = 0;
    }

Anyone can help me or suggest me how to do it in better way?

How can I print it on my output file? Is there any good way to do it by shell programming?

Thank you in advance

perl python • 2.7k views

ADD COMMENT • link updated 2.8 years ago by Ram 43k • written 10.8 years ago by Raghav ▴ 100

0

Entering edit mode

Hi Raghvendra. Please take some time to format your future questions properly. I removed the superfluous spaces in your files examples that were making it hard to read.

ADD REPLY • link 10.8 years ago by Eric Normandeau 11k

Ram · Answer 1 · 2013-06-28

1

Entering edit mode

10.8 years ago

Pierre Lindenbaum 161k

not python , not tested, but it should work:

( echo "create table G(name varchar(100) not NULL unique, C1 int,C2 int);"  && \
 awk -F '\t' '{printf("insert into G(name,C1,C2) values ("\"%s\",0,0);\n",$1);}'  gene_id.txt &&
  awk -F '\t' '{printf("update G set C1=%s where name=\"%s\";\n",$1,$2);}'  data_Set1 &&
 awk -F '\t' '{printf("update G set C2=%s where name=\"%s\";\n",$1,$2);}'  data_Set2 &&
echo "select * from G;" ) | sqlite3 tmp.db

ADD COMMENT • link 10.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Dear Sir,

Thank you for your quick response, when I am running your code I have got an error

[cdac@nbri surabh]$ ( echo "create table G(name varchar(100) not NULL unique, C1 int,C2 int);"  && awk -F '\t' '{printf("insert into G(name,C1,C2) values ("\"%s\",0,0);\n",$1);}'  gene_id.txt && awk -F '\t' '{printf("update G set C1=%s where name=\"%s\";\n",$1,$2);}'  data_Set1 && awk -F '\t' '{printf("update G set C2=%s where name=\"%s\";\n",$1,$2);}'  data_Set2 && echo "select * from G;" ) sqlite3 tmp.db
-bash: syntax error near unexpected token `sqlite3'

ADD REPLY • link updated 2.8 years ago by Ram 43k • written 10.8 years ago by Raghav ▴ 100

1

Entering edit mode

fixed, I forgot the | before sqlite3

ADD REPLY • link updated 2.8 years ago by Ram 43k • written 10.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Dear Sir,

I got this message on my terminal with output tmp.db of approx 3 kb size. How do I open it ? can i replace tmp.db with output.txt?

[cdac@nbri surabh]$ ( echo "create table G(name varchar(100) not NULL unique, C1 int,C2 int);"  && awk -F '\t' '{printf("insert into G(name,C1,C2) values ("\"%s\",0,0);\n",$1);}'  gene_id.txt && awk -F '\t' '{printf("update G set C1=%s where name=\"%s\";\n",$1,$2);}'  data_Set1 && awk -F '\t' '{printf("update G set C2=%s where name=\"%s\";\n",$1,$2);}'  data_Set2 && echo "select * from G;" ) | sqlite3 tmp.db
awk: {printf("insert into G(name,C1,C2) values ("\"%s\",0,0);\n",$1);}
awk:                                             ^ backslash not last character on line

ADD REPLY • link updated 2.8 years ago by Ram 43k • written 10.8 years ago by Raghav ▴ 100