PDB chain ID specification justification
1
0
Entering edit mode
20 months ago
dyang20 • 0

Hello, according to the PDB format specification for chain IDs a chain ID can only consist of 1 (or 2) alphanumeric characters. I am wondering if there is a reason for this limitation since it seems to be more convenient to have chain IDs that are more descriptive than a single letter identifier. For example, if I have a multimer that contains multiple proteins that are identified as chains in the PDB, the only way to identify the proteins is to know their sequence length/ structure beforehand. This can be rather bothersome when using visualization software for analysis such as ChimeraX or for running a Python script to analyze crosslinks between specific proteins.

proteomics PDB ChimeraX proteins • 633 views
ADD COMMENT
0
Entering edit mode
20 months ago
Mensur Dlakic ★ 27k

The PDB format was created in 1970s, or even 60s. In those days it is likely that there wasn't much thought given to the problems you have, because it would have been difficult to foresee them. Since the format is tabular with defined spacing between different sections, it is impossible to fit an arbitrary number of characters as a chain ID without breaking the parsing function of thousands of programs. I don't think that is going to change any time soon, if ever. That said, I have seen chain IDs up to 4 letters (I think that's the maximum spacing allotted to chain ID), thought that most likely doesn't help you very much.

ADD COMMENT

Login before adding your answer.

Traffic: 2748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6