Question

Data clustering / matching enteries

0

Entering edit mode

4.5 years ago

bioinfobeginner • 0

I have a file

A 1 2 3 4 5 6

B 2 4 54 34 3 10

C 1 2 4 5 7 5 2

D 3 4 5 6 73 4

A 11 232 4 6 7 8

E 232 53 64 76 76 54

A 0 0 0 0 0 0 

B 12 34 56 23 35 76

C 23 45 65 24 23 24

I want to cluster the data. i am new to programming. could anyone help me out how to write a code. i want the output as :

A 1 2 3 4 5 6

A 11 232 4 6 7 8

A 0 0 0 0 0 0

(new line)

B 2 4 54 34 3 10

B 12 34 56 23 35 76

(new line)

C 23 45 65 24 23 24

C 1 2 4 5 7 5 2

(new line)

D 3 4 5 6 73 4

(new line)

E 232 53 64 76 76 54.

(new line)

I think i can set it up as a dictionary and search for repeated keys in python. or can use NR==FNR in bash. But i dont know how to write it in code form. Could anyone help.

python bash • 760 views

ADD COMMENT • link updated 4.5 years ago by Alex Reynolds 35k • written 4.5 years ago by bioinfobeginner • 0

score 2 · Answer 1 · 2019-11-19

If you need items presented as found in their original order:

$ awk '{ arr[$1] = arr[$1]"|"$0; } END{ for (k in arr) { s=arr[k]; n=split(s,a,"|"); for (i=2;i<=n;i++) { print a[i]; } print ""; } }' entries.txt 
A 1 2 3 4 5 6
A 11 232 4 6 7 8
A 0 0 0 0 0 0 

B 2 4 54 34 3 10
B 12 34 56 23 35 76

C 1 2 4 5 7 5 2
C 23 45 65 24 23 24

D 3 4 5 6 73 4

E 232 53 64 76 76 54

$