I plan to use the PDB advanced search to filter sequences. I need to create a test set of protein sequences. The selection conditions are probably chain length, resolution, macromolecule type, etc., which are all easy to implement.
But there is another restriction: retrieving representative at 30% sequence identity. How do I achieve this?