10 weeks ago
I'm starting to study protein modeling and I have a few questions.

  1. When there are parts at the end of the protein that cannot be modeled or are very difficult to model, should I remove them?
  2. Is it possible to model a complex of a protein with more than 1300 aa in the AlphaFold2 colab? If so, what parameters should I change to be able to do this?
  3. What software can I use to refine the model?
  4. What software can I use to build a trimer from a monomer?
10 weeks ago
dthorbur ★ 1.9k
  1. No, you shouldn't remove parts of the sequence that fold poorly. If you are only interested in specific domains, go for it, but in my experience most terminal ends of sequences always model poorly with AF2. I suspect this is often due to poor MSA depth at the start and end, though I have seen exceptions to this idea.

  2. As in the complex is ~1300 AA in length, or each subunit is 1300? I frequently fold multimers up to a total length of 2400 AAs using Colabfold, but I use a private GCP GPU VM. Note that this is probably around the limit of an NVIDIA T4, but with an A100 you could easily fold complexes much longer. If I'm not mistaken, OpenFold is more memory efficient, so if you really needed to fold a large heteromeric complex, this is possible with the same hardware.

  3. AMBER relaxation is a form of refinement, and built into the Colabfold workflow. Otherwise you can fine-tune AF2 or even retrain OpenFold or create a custom MSA database. I'm sure there are plenty of other things you can try too, but these are what i tend to focusing on.

  4. I can't seem to find them, but I've seen a few dedicated implementations of AF2 designed specifically for large monomeric complexes. I'm sure with some searching on google scholar you could find them with a little more effort.

Thank you so much! You've helped me a lot!


