Question: ZDOCK Benchmark PDB files have unusual format
0
gravatar for james
7 months ago by
james20
james20 wrote:

I am new to working with PDB format files, and I am having difficulty working with the ZDOCK Benchmark files.

Their input PDBs for generating decoys seem to have 2 extra columns, and the filenames end with *.pdb.ms

Does anyone know what type of files these really are?

The output decoy PDBs generated by their software maintain these extra columns. For example:

ATOM      1  N   GLU A   6      72.093  26.103  78.886  8     1 1.63         -0.15
ATOM      2  CA  GLU A   6      71.909  24.863  78.143  8     1 2.03          0.10
ATOM      3  C   GLU A   6      70.753  24.029  78.676  8     1 1.67          0.60
ATOM      4  O   GLU A   6      70.717  23.551  79.806  8     1 1.38         -0.55

Column 10 (with the integers 8) and the last column do not seem to be usual PDB fields. Can I just ignore these columns, and create a standard PDB?

next-gen • 222 views
ADD COMMENTlink modified 6 months ago • written 7 months ago by james20

Sorry for the late reply. I thought I would be notified by email if anyone responded.

The problem is that in the example I gave, the fields do not match the ATOM specification. They only approximately correspond. Here is what a pdb file downloaded with pdb-tools looks like. It matches the ATOM specification exactly:

ATOM      1  N   LYS A   4      28.189   5.020  62.680  1.00 68.66           N  
ATOM      2  CA  LYS A   4      27.705   5.368  64.017  1.00 67.66           C  
ATOM      3  C   LYS A   4      26.198   5.204  64.109  1.00 64.00           C  
ATOM      4  O   LYS A   4      25.398   5.669  63.303  1.00 63.53           O

You can see the difference between the ZDOCK files I was posting about, and the "legit" pdb format.

ADD REPLYlink written 6 months ago by james20

I already said what I thought was pertinent to your problem, but I will repeat:

nothing that is in occupancy field and beyond (starting with column #55) should break them.

After XYZ atom coordinates (starting with character column #55), many programs put information into their PDB output that may or may not be according to specification. This shouldn't matter, as most programs either completely ignore everything that comes after XYZ atom coordinates, or at least do not rely on that information in any serious way. Unless you tried loading ZDOCK's PDB files and that failed somehow, I think you should not worry about anything beyond character column #55 (or anything beyond 9th column overall).

ADD REPLYlink written 6 months ago by Mensur Dlakic7.1k

Ah, thank you. I didn't read the original answer carefully enough.

Unfortunately, the program I am trying to use: https://github.com/kiharalab/DOVE, exits with an error depending on what is after the z-coordinate. I will look into the code to try and determine whether it actually relies on the information...

ADD REPLYlink written 6 months ago by james20

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLYlink written 6 months ago by genomax92k
1
gravatar for Mensur Dlakic
7 months ago by
Mensur Dlakic7.1k
USA
Mensur Dlakic7.1k wrote:

These are not "extra" columns, though they may not have the information that is normally there. The explanation of all fields in ATOM lines is here. Some programs read the information beyond the 9th column (z-coordinate) and others do not, but nothing that is in occupancy field and beyond (starting with column #55) should break them. Unless you have some reason that is not obvious to me, you are probably OK leaving the file as it is.

PS An older guide to PDB format is here.

ADD COMMENTlink modified 7 months ago • written 7 months ago by Mensur Dlakic7.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1203 users visited in the last hour