Question: Data clustering / matching enteries
0
gravatar for nandelvibhuti
12 months ago by
nandelvibhuti0 wrote:

I have a file

A 1 2 3 4 5 6

B 2 4 54 34 3 10

C 1 2 4 5 7 5 2

D 3 4 5 6 73 4

A 11 232 4 6 7 8

E 232 53 64 76 76 54

A 0 0 0 0 0 0 

B 12 34 56 23 35 76

C 23 45 65 24 23 24

I want to cluster the data. i am new to programming. could anyone help me out how to write a code. i want the output as :

A 1 2 3 4 5 6

A 11 232 4 6 7 8

A 0 0 0 0 0 0

(new line)

B 2 4 54 34 3 10

B 12 34 56 23 35 76

(new line)

C 23 45 65 24 23 24

C 1 2 4 5 7 5 2

(new line)

D 3 4 5 6 73 4

(new line)

E 232 53 64 76 76 54.

(new line)

I think i can set it up as a dictionary and search for repeated keys in python. or can use NR==FNR in bash. But i dont know how to write it in code form. Could anyone help.

bash python • 241 views
ADD COMMENTlink modified 12 months ago by Alex Reynolds31k • written 12 months ago by nandelvibhuti0
2
gravatar for Alex Reynolds
12 months ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

If you need items presented as found in their original order:

$ awk '{ arr[$1] = arr[$1]"|"$0; } END{ for (k in arr) { s=arr[k]; n=split(s,a,"|"); for (i=2;i<=n;i++) { print a[i]; } print ""; } }' entries.txt 
A 1 2 3 4 5 6
A 11 232 4 6 7 8
A 0 0 0 0 0 0 

B 2 4 54 34 3 10
B 12 34 56 23 35 76

C 1 2 4 5 7 5 2
C 23 45 65 24 23 24

D 3 4 5 6 73 4

E 232 53 64 76 76 54

$
ADD COMMENTlink modified 12 months ago • written 12 months ago by Alex Reynolds31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1425 users visited in the last hour