Question

sorting clusters from cd hit

0

Entering edit mode

19 months ago

Nabil • 0

Greetings

I have used cd-hit to find out sequence similarity between proteins now i need to sort them here is an example

>Cluster 0
0    287aa, >CM_M_XP_007408389.1... at 62.72%
1    293aa, >CM_M_XP_007408535.1... *
>Cluster 1
0    291aa, >CM_TX_POW04575.1... at 100.00%
1    292aa, >CM_ST_POW09224.1... *
>Cluster 2
0    286aa, >CM_PG_KAA1076669.1... *
>Cluster 3
0    285aa, >CM_M_XP_007406760.1... *
>Cluster 4
0    278aa, >CM_PS_KNZ46184.1... *
1    275aa, >CM_PT_OAV90755.1... at 58.91%
>Cluster 5
0    241aa, >CM_PG_KAA1096683.1... at 51.04%
1    266aa, >CM_PG_KAA1096686.1... *
2    236aa, >CM_PG_KAA1113276.1... at 50.85%
3    262aa, >CM_PG_KAA1113279.1... at 94.66%
4    241aa, >CM_CRL_EFP86512.1... at 50.62%
5    248aa, >CM_ST_POW02451.1... at 52.02%
>Cluster 6
0    251aa, >CM_PS_KNZ44295.1... *
>Cluster 7
0    236aa, >CM_PG_KAA1083848.1... at 88.98%
1    250aa, >CM_PG_KAA1119265.1... *
2    250aa, >CM_CRL_EHS63005.1... at 100.00%
>Cluster 8
0    250aa, >CM_PS_KNZ57382.1... *
>Cluster 9
0    236aa, >CM_PG_KAA1105946.1... *
1    236aa, >CM_PG_KAA1114903.1... at 97.46%
2    236aa, >CM_CRL_EFP74682.1... at 97.46%
3    235aa, >CM_TX_POW15956.1... at 84.68%
4    235aa, >CM_ST_POW04000.1... at 84.68%
5    232aa, >CM_PT_OAV92548.1... at 88.79%

Is there a way where I can sort them according to the number of proteins beneath each cluster without doing it manually?

Here's an image in case the text is messed up:

image of text

cd-hit • 637 views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 19 months ago by Nabil • 0

0

Entering edit mode

What is the question here?

I formatted the list you had included at top properly (with 10101 code option in editor) but it looks to be similar to the screenshot you posted.

ADD REPLY • link 19 months ago by GenoMax 144k