I would like to understand how the PDB manages domains. My understanding is that the internal folding of a domain is independent of inter-domain contacts that may occur in final 3d protein structure. If this is the case, why are recurring domains across a variety of proteins repeated in every PDB file in which they occur rather than have some kind of non-redundant representation and reference it? Is there a way to obtain a non-redundant representation of the 3 dimensional structures of protein domains in terms of internal coordinates?
PDB holds experimental structures. Technically PDB holds physical models made by authors from experimental electron density maps (Uppsala repository holds these for PDB structures http://eds.bmc.uu.se/eds/). Because of this, structures of the same domains in different dmodels are different. When you see several structures of the same protein in PDB chances are high that these structures were resolved with different protein sequence, small chemicals (ligands, drugs) and heavy metals. All of this affects experimental structures in PDB.
Your idea about having a database of non-redundant 3D structures for domains is interesting. There are some like this made for benchmarking of small molecule docking. But than the question is what are these structures are going to be used for? Because the way you optimize structures is going to be different for different tasks. Analysis of mutations is very different from drug design.