Question: Renumber Pdb Files To Match Actual Sequence
gravatar for Whetting
7.7 years ago by
Bethesda, MD
Whetting1.5k wrote:

I am working on a project aimed at compiling papillomavirus sequence information. I will gladly share the link if people are interested, but I do not want to spam. Anyway, as part of the effort we want to show alignments between pdb structure files and HPV sequences.
We noticed that several PDB files were not numbered according to the actual genome. E.g. assume the C-terminal domain of protein x was crystallized, the numbering should be residue 250 to residue 500, however, the crystallographer numbered the PDB file according to the peptide crystallized. Does anyone have any suggestions for a program that may be able to accomplish the renumbering? Thanks!

EDIT: I think I may have found a solution.
I think I can write a tool pdbsws using and a perl file I found here:

pdb sequence • 4.1k views
ADD COMMENTlink modified 7.7 years ago by Vladimir Chupakhin520 • written 7.7 years ago by Whetting1.5k
gravatar for Vladimir Chupakhin
7.7 years ago by
Toledo, Spain
Vladimir Chupakhin520 wrote:

Sometimes PDB numbering is quite a mess. I used protein alignment but it's useless in term of full PDB database. Take a look at the service pdbsws

ADD COMMENTlink written 7.7 years ago by Vladimir Chupakhin520

That's pretty cool, wish I had known about that one earlier!

ADD REPLYlink written 7.7 years ago by Will4.5k
gravatar for Will
7.7 years ago by
United States
Will4.5k wrote:

I've come across the same problem. My method has been to align (using a local alignment) the PDB sequences with the relevant protein sequences and determine the proper numbering from there. I wrote a simple Matlab script to do the re-numbering but any language should work just as well.

Also, don't forget to account for gaps in the PDB sequences. I've found many instances where the crystal structure is missing parts in the middle.

ADD COMMENTlink written 7.7 years ago by Will4.5k

Hi Will, the problem I ran into was that it seemed impossible to completely renumber the entire pdb file. I.e. helices, sheets,...have to be renumbered as well. Did you write a script that updated all those lines, or is that not necessary to parse the pdb file?

ADD REPLYlink written 7.7 years ago by Whetting1.5k

Essentially I just use the script to write out the position (X,Y,Z), chain, original-index, and full-protein-index of each AA to a separate file. Then I just used those for my downstream analysis ... I didn't try to write anything back into the PDB file.

ADD REPLYlink written 7.7 years ago by Will4.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1622 users visited in the last hour