I have a list of proteins that I want to divide into homogeneous (in terms of potential similar function) clusters.
As first approach I clustered them using h-cd-hit, with 3 reiteration at 90-80-70% similarity and allowing only 75 aa difference among the proteins. This parameters were chosen because they better resolve my data.
I obtained decent results for them, but when I look at the domain composition of the representative sequences of each clusters I can see that in same cases I have highly similar domain architecture. I would say that similar domain architecture suggest similarity in function. Therefore I would like to perform a second clustering based on similarity of domain architecture.
In this case 1) and 2) will cluster together.
Is there any available tool for doing that?