For the purposes of structure prediction, I'm trying to sort through the best way to obtain training data. I'm interested in domain structure prediction rather than inter-domain contacts and conformation. There is a need to do this by domain and to try to eliminate redundancy as much as possible. But there are other factors. A domain will appear in many proteins and some proteins will appears in many different PDB entries. Ligands, small molecules, and stoichiometry can all effect conformation to a degree. Therefore, I have a notion of canonical structure for a domain. Does such a thing exist and if so where can I find it? It would be these canonical structures that can then be used as training data for the problem of blind protein structure prediction for a domain.