Entering edit mode
3 months ago
Nabil
•
0
Greetings
I have used cd-hit to find out sequence similarity between proteins now i need to sort them here is an example
>Cluster 0
0 287aa, >CM_M_XP_007408389.1... at 62.72%
1 293aa, >CM_M_XP_007408535.1... *
>Cluster 1
0 291aa, >CM_TX_POW04575.1... at 100.00%
1 292aa, >CM_ST_POW09224.1... *
>Cluster 2
0 286aa, >CM_PG_KAA1076669.1... *
>Cluster 3
0 285aa, >CM_M_XP_007406760.1... *
>Cluster 4
0 278aa, >CM_PS_KNZ46184.1... *
1 275aa, >CM_PT_OAV90755.1... at 58.91%
>Cluster 5
0 241aa, >CM_PG_KAA1096683.1... at 51.04%
1 266aa, >CM_PG_KAA1096686.1... *
2 236aa, >CM_PG_KAA1113276.1... at 50.85%
3 262aa, >CM_PG_KAA1113279.1... at 94.66%
4 241aa, >CM_CRL_EFP86512.1... at 50.62%
5 248aa, >CM_ST_POW02451.1... at 52.02%
>Cluster 6
0 251aa, >CM_PS_KNZ44295.1... *
>Cluster 7
0 236aa, >CM_PG_KAA1083848.1... at 88.98%
1 250aa, >CM_PG_KAA1119265.1... *
2 250aa, >CM_CRL_EHS63005.1... at 100.00%
>Cluster 8
0 250aa, >CM_PS_KNZ57382.1... *
>Cluster 9
0 236aa, >CM_PG_KAA1105946.1... *
1 236aa, >CM_PG_KAA1114903.1... at 97.46%
2 236aa, >CM_CRL_EFP74682.1... at 97.46%
3 235aa, >CM_TX_POW15956.1... at 84.68%
4 235aa, >CM_ST_POW04000.1... at 84.68%
5 232aa, >CM_PT_OAV92548.1... at 88.79%
Is there a way where I can sort them according to the number of proteins beneath each cluster without doing it manually?
Here's an image in case the text is messed up:
What is the question here?
I formatted the list you had included at top properly (with
10101
code option in editor) but it looks to be similar to the screenshot you posted.