Data clustering / matching enteries
1
0
Entering edit mode
4.5 years ago

I have a file

A 1 2 3 4 5 6

B 2 4 54 34 3 10

C 1 2 4 5 7 5 2

D 3 4 5 6 73 4

A 11 232 4 6 7 8

E 232 53 64 76 76 54

A 0 0 0 0 0 0 

B 12 34 56 23 35 76

C 23 45 65 24 23 24

I want to cluster the data. i am new to programming. could anyone help me out how to write a code. i want the output as :

A 1 2 3 4 5 6

A 11 232 4 6 7 8

A 0 0 0 0 0 0

(new line)

B 2 4 54 34 3 10

B 12 34 56 23 35 76

(new line)

C 23 45 65 24 23 24

C 1 2 4 5 7 5 2

(new line)

D 3 4 5 6 73 4

(new line)

E 232 53 64 76 76 54.

(new line)

I think i can set it up as a dictionary and search for repeated keys in python. or can use NR==FNR in bash. But i dont know how to write it in code form. Could anyone help.

python bash • 760 views
ADD COMMENT
2
Entering edit mode
4.5 years ago

If you need items presented as found in their original order:

$ awk '{ arr[$1] = arr[$1]"|"$0; } END{ for (k in arr) { s=arr[k]; n=split(s,a,"|"); for (i=2;i<=n;i++) { print a[i]; } print ""; } }' entries.txt 
A 1 2 3 4 5 6
A 11 232 4 6 7 8
A 0 0 0 0 0 0 

B 2 4 54 34 3 10
B 12 34 56 23 35 76

C 1 2 4 5 7 5 2
C 23 45 65 24 23 24

D 3 4 5 6 73 4

E 232 53 64 76 76 54

$
ADD COMMENT

Login before adding your answer.

Traffic: 2672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6