How Can I Filter Multiple Data From Multiple Files And Printing All Data Into One File
1
0
Entering edit mode
10.8 years ago
Raghav ▴ 100

Dear all,

I have three input files, named, gene_id.txt, data_set1.txt and data_Set2.txt

gene_id has entries like:

gnl|UG|Ta#S12874103
gnl|UG|Ta#S12880111
gnl|UG|Ta#S12885252
gnl|UG|Ta#S12916414
gnl|UG|Ta#S12886521
gnl|UG|Ta#S12959389
gnl|UG|Ta#S12889059
gnl|UG|Ta#S12892897
gnl|UG|Ta#S12892904

data_Set1 has [tab separated]:

454    gnl|UG|Ta#S12874103
35    gnl|UG|Ta#S12916414
200    gnl|UG|Ta#S12917670
5    gnl|UG|Ta#S12959389

data_set2 has:

34    gnl|UG|Ta#S12935716
21    gnl|UG|Ta#S12959389

I expect output like:

gene_ids                           data_set1                  dataset2
gnl|UG|Ta#S12874103                454                        0
gnl|UG|Ta#S12880111                0                          0
gnl|UG|Ta#S12885252                0                          0
gnl|UG|Ta#S12916414                35
gnl|UG|Ta#S12886521                0                          0
gnl|UG|Ta#S12959389                5                          21

all gene ids ......

I am looking for an program which check data_Set1 and data_set2 entries in gene_id list and if any match found, print number [which are showing in data_set1 and data_set2] corresponding gene_ids list.

I have written a perl script in very crude way which can handle only data_set1.txt and gene_ids.txt at a time. I am even unable to print it new output file, here is my program

$/=undef;
$aa=@ARGV[0];
$bb=@ARGV[1];
#$cc=@ARGV[2];
open(a1,"$aa");
open(b1,"$bb");
#open(c1,"$cc");

$x=<a1>;
$y=<b1>;
#$z=<c1>;
#print "$x \n";
#print "$y \n";
#print "$z \n";

@gene_name = split (/\n/,$x);
  #print "query gene ids =\n @gene_name[6]\n";

@data_set1=split (/\n/,$y);
 #print"data set one = \n $data_set1[4]\n";

#@data_set2=split(/\n/,$z);
#print "dataset two = \n @data_set2\n";

$flag =0;

for ($i=0;$i<=$#gene_name;$i++)
            {
            #print "$gene_name[1]\n";    

           for ($j=0;$j<=$#data_set1;$j++)
            {
               ($n1,$n2)=split (/\t/,$data_set1[$j]);
                    #print"$n1\n";
                    #print"$n2\n";
                        #print "$data_set1[$j] \n";
                                 if ( $n2 eq $gene_name[$i])
                                    {
                                    #print "$gene_name[$j]\t$n1\n";
                                 $flag=1;

                                 print "$gene_name[$i]\t$n1\n";                
                            #$out=@ARGV[2]; 
                        #open(ff12,">>$out");#appending 
                    #print ff12 "gene_name[$i]\t,$n1";
                                    }
                    }

    if ( $flag eq "0" )
               {
                 print "$gene_name[$i]\n";
                }
    $flag = 0;
    }

Anyone can help me or suggest me how to do it in better way?

How can I print it on my output file? Is there any good way to do it by shell programming?

Thank you in advance

perl python • 2.7k views
ADD COMMENT
0
Entering edit mode

Hi Raghvendra. Please take some time to format your future questions properly. I removed the superfluous spaces in your files examples that were making it hard to read.

ADD REPLY
1
Entering edit mode
10.8 years ago

not python , not tested, but it should work:

( echo "create table G(name varchar(100) not NULL unique, C1 int,C2 int);"  && \
 awk -F '\t' '{printf("insert into G(name,C1,C2) values ("\"%s\",0,0);\n",$1);}'  gene_id.txt &&
  awk -F '\t' '{printf("update G set C1=%s where name=\"%s\";\n",$1,$2);}'  data_Set1 &&
 awk -F '\t' '{printf("update G set C2=%s where name=\"%s\";\n",$1,$2);}'  data_Set2 &&
echo "select * from G;" ) | sqlite3 tmp.db
ADD COMMENT
0
Entering edit mode

Dear Sir,

Thank you for your quick response, when I am running your code I have got an error

[cdac@nbri surabh]$ ( echo "create table G(name varchar(100) not NULL unique, C1 int,C2 int);"  && awk -F '\t' '{printf("insert into G(name,C1,C2) values ("\"%s\",0,0);\n",$1);}'  gene_id.txt && awk -F '\t' '{printf("update G set C1=%s where name=\"%s\";\n",$1,$2);}'  data_Set1 && awk -F '\t' '{printf("update G set C2=%s where name=\"%s\";\n",$1,$2);}'  data_Set2 && echo "select * from G;" ) sqlite3 tmp.db
-bash: syntax error near unexpected token `sqlite3'
ADD REPLY
1
Entering edit mode

fixed, I forgot the | before sqlite3

ADD REPLY
0
Entering edit mode

Dear Sir,

I got this message on my terminal with output tmp.db of approx 3 kb size. How do I open it ? can i replace tmp.db with output.txt?

[cdac@nbri surabh]$ ( echo "create table G(name varchar(100) not NULL unique, C1 int,C2 int);"  && awk -F '\t' '{printf("insert into G(name,C1,C2) values ("\"%s\",0,0);\n",$1);}'  gene_id.txt && awk -F '\t' '{printf("update G set C1=%s where name=\"%s\";\n",$1,$2);}'  data_Set1 && awk -F '\t' '{printf("update G set C2=%s where name=\"%s\";\n",$1,$2);}'  data_Set2 && echo "select * from G;" ) | sqlite3 tmp.db
awk: {printf("insert into G(name,C1,C2) values ("\"%s\",0,0);\n",$1);}
awk:                                             ^ backslash not last character on line
ADD REPLY

Login before adding your answer.

Traffic: 2948 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6