I downloaded a bunch of PDB files from http://lmc.uab.cat/TMalphaDB/info.php, where each PDB file contains only the transmembrane helices of the membrane proteins.
I would like to cluster them in terms of their structure, and my first attempt was with Foldseek. The best I could come up with is the follow command:
foldseek easy-multimercluster bundles-alpha/ bundles-alpha-tm50 tmp --multimer-tm-threshold 0.5 --cov-mode 1 --alignment-type 2 --interface-lddt-threshold 0.5 --tmscore-threshold 0.5 --lddt-threshold 0.5
easy-multimercluster is needed because the transmembrane helices are all disconnected. This resulted in some clusters with 4 to 5 members, but mostly singletons.
I tried to look for alternative structure clustering methods, like Secondary Structure Matching (I can only find the webserver PDBeFOLD), but they seem like a great hassle.
Are there any convenient resources for me to cluster the transmembrane domains of membrane proteins, or any suggestions of how I could accomplish this with Foldseek?
How many is a bunch? Have you checked how similar the clusters/singletons are? I would check them in a visualisation tool like ChimeraX or PyMol. Foldseek is the most appropriate tool I can think of for this use case, so if it's not working it might be a reflection of the data. Another option would be to include more of the proteins so they aren't just disconnected helices to see if that impacts the outcome.