Structural Clustering of Transmembrane Domains of Membrane Proteins
1
0
Entering edit mode
11 weeks ago
William • 0

I downloaded a bunch of PDB files from http://lmc.uab.cat/TMalphaDB/info.php, where each PDB file contains only the transmembrane helices of the membrane proteins.

I would like to cluster them in terms of their structure, and my first attempt was with Foldseek. The best I could come up with is the follow command:

foldseek easy-multimercluster bundles-alpha/ bundles-alpha-tm50 tmp \
  --multimer-tm-threshold 0.5 \
  --cov-mode 1 \
  --alignment-type 2 \
  --interface-lddt-threshold 0.5 \
  --tmscore-threshold 0.5 \
  --lddt-threshold 0.5

easy-multimercluster is needed because the transmembrane helices are all disconnected. This resulted in some clusters with 4 to 5 members, but mostly singletons.

I tried to look for alternative structure clustering methods, like Secondary Structure Matching (I can only find the webserver PDBeFOLD), but they seem like a great hassle.

Are there any convenient resources for me to cluster the transmembrane domains of membrane proteins, or any suggestions of how I could accomplish this with Foldseek?

foldseek • 5.5k views
ADD COMMENT
0
Entering edit mode

How many is a bunch? Have you checked how similar the clusters/singletons are? I would check them in a visualisation tool like ChimeraX or PyMol. Foldseek is the most appropriate tool I can think of for this use case, so if it's not working it might be a reflection of the data. Another option would be to include more of the proteins so they aren't just disconnected helices to see if that impacts the outcome.

ADD REPLY
0
Entering edit mode
12 days ago
Kevin Blighe ★ 90k

You downloaded PDB files containing only transmembrane helices from TMalphaDB and attempted structural clustering using Foldseek's easy-multimercluster command, which produced mostly singletons despite some small clusters.

Foldseek remains appropriate for this task, but your results suggest that the disconnected helices require parameter adjustments to capture distant structural similarities. Lower the similarity thresholds to include more matches, and switch to TMalign for global alignment, which better suits helical bundles. Preprocess your PDB files into a database for efficiency.

Here is an improved command:

foldseek createdb bundles-alpha/ transDB
foldseek createindex transDB tmp
foldseek easy-multimercluster transDB bundles-alpha-tm50 tmp \
  --multimer-tm-threshold 0.4 \
  --cov-mode 1 \
  --alignment-type 1 \
  --interface-lddt-threshold 0.4 \
  --tmscore-threshold 0.3 \
  --lddt-threshold 0.4 \
  --num-iterations 0

This enables iterative searching and may yield larger clusters. If singletons persist, the data may lack sufficient structural conservation, as noted in the comment—visualize representatives in PyMOL or ChimeraX to confirm.

For alternatives, use ChimeraX's Similar Structures tool. It searches with Foldseek and clusters hits by backbone traces using UMAP, ideal for transmembrane helices. Provide a query PDB, search against PDB or AlphaFold databases, fetch C-alpha coordinates, and cluster via the 'similarstructures cluster' command, specifying residues in helices.

Another resource is RCSB PDB's Advanced Search > Structure Similarity, which clusters similar structures using BioZernike or FATCAT algorithms—upload your PDBs or search by ID.

TMKit offers Python-based analysis of transmembrane proteins, including structural feature extraction from PDBs. Extract features like helix orientations, then cluster via scikit-learn's DBSCAN or hierarchical methods.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 3488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6