formatting problem (awk/bash)
1
0
Entering edit mode
6.5 years ago
lessismore ★ 1.3k

Hello everybody,

really simple question i think.

text1 5 r1
text2 4 r1
text3 6 r1
text1 6 r2
text2 44 r2
text3 5 r2

i would like to have this
          r1 r2
text1     5  6
text2     4  44
text3     6  5

thanks in advance

awk bash • 1.4k views
ADD COMMENT
2
Entering edit mode

datamash output:

$ datamash crosstab 1,3 sum 2 < test.txt  
        r1  r2
text1   5   6
text2   4   44
text3   6   5

input:

$ cat test.txt 
text1   5   r1
text2   4   r1
text3   6   r1
text1   6   r2
text2   44  r2
text3   5   r2
ADD REPLY
0
Entering edit mode

May be this is not a right place to ask this question. (You can check datamash utility)

ADD REPLY
0
Entering edit mode

what is the logic ? how is it related to bioinformatics ? put the real data.

ADD REPLY
0
Entering edit mode

Sorry i simplified the problem. Text are gene ID and numbers are expression values. The idea is to generate a matrix for plotting the data. "r" stands for replicate.

ADD REPLY
0
Entering edit mode

Real data are always better as some special cases must be handled depending on your expression values and ID. ;)

edit: Do you always have r1 lines, and then r2 lines and so on? Or it can be unordered?

ADD REPLY
1
Entering edit mode
6.5 years ago
rm -f tmp.sqlite3 && cat input.txt | awk 'BEGIN{printf("create table T(gene unique,r1,r2);\n");}{printf("insert or ignore into T(gene) values(\"%s\");\nupdate T set %s=%s where gene=\"%s\";\n",$1,$3,$2,$1);}END{printf("select * from T; drop table T;\n");}'  | sqlite3 

text1|5|6
text2|4|44
text3|6|5
ADD COMMENT

Login before adding your answer.

Traffic: 1830 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6