I am trying to map ChEMBL protein targets to the corresponding protein crystal in the PDB. For most cases this can be done via the Uniprot accession id using the SIFTS resource. But for some of the protein targets reported in ChEMBL there is additional information that reports if the target has engineered mutations - this comes together with the mutated sequence and the Uniprot accession id of the original sequence.
The problem is the following: how should I do to programatically find out if there is in the PDB a matching protein entry to both the Uniprot ID and the engineered mutations?
I suppose I should do sequence alignment - is there any Python or command line package that you would recommend?
Thank you! Looking forward to read your suggestions :)