I'm interested in the impact of protein post-translational modifications on structure, in particular on ligand binding. This includes both the affinity of small molecules to their active site if the target protein is modified, or novel protein docking that can occur if an amino acid is modified (the canonical example being histone tail PTMs).
I am aware that ColabFold, although it was not trained using PTMs, is "aware" of them in its predictions, as some structures in the training data were solved with PTMs in the sequence, so this seems like an OK baseline. While there is an
alphafold-ptm model, it appears this refers to fine-tuning using the
iPTM score rather than including PTM sequences in training.
However, as there there are specific PTM databases (IRMPS http://cluster.physics.iisc.ac.in/imrps/; PRISMOID https://prism.erc.monash.edu/), I want to know if there are any PTM-aware protein folding/docking/affinity predictors -- including any under development -- where (for instance) FASTA sequences can be augmented with UNIMOD/PSI-MOD suffixes (
EM[UNIMOD:35]EVEES[UNIMOD:21]PEK)? I have not turned any up.
A number of the docked structures have been solved; so it should be possible to incorporate those datapoints into the OpenFold training data; the only issue would be in mapping PTMs as new input features (and, of course, re-training). How likely do you think it is that a PTM-aware model will be released in the next (say) 6M?
I don't think it is likely that a PTM-aware model will be done soon. There is still a lot of fine-tuning to be done for protein-protein and protein-ligand docking. I think there is greater demand for both categories, and more training data available. That said, it takes only one group deciding to tackle PTMs for my prediction to be wrong, so anything is possible.