PyMOL: Gaps in protein structure yet shown by the sequence
6 weeks ago
spence.lank

In the screenshot below, I'm trying to determine what exactly is meant by the gap in the protein structure--depicted by dashed noodle.

I'm trying to determine what is meant by the greyed out residues: VSGTNGT. The protein structure seems to suggest that they are not present because only one residue looks like it could fit in that dashed noodle gap, but then what exactly is between the H and K residues if not the VSGTNGT residues?

The PDB ID for this structure is 7DK3 and the literature is here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7775788/

6 weeks ago
Mensur Dlakic

Dashing lines indicate that at least one residue is missing in the structure, possibly more.

If you search for REMARK 465 in the PDB file for 7dk3, you will find that many residues are missing in all 3 chains.

Following on this, the REMARK 465 data is used to make a nice visual representation included in the PDBsum pictorial summary produced for every structure deposited in the PDB. You can inspect this, and the information revealed by other tools detailed here, in conjunction with your analysis in PyMOL or even before you open the structure in your favorite molecular viewer to see what parts of the sequence that particular structure will not inform you about. FirstGlance in Jmol also will nicely guide you through assessing residues missing without you needing to dig in the PDB file text. Traditionally the PDB itself focused on what was present in the experiment, and so it was often not clear from there what was not represented in the final deposited structure. This has finally changed as some indicators are now present if you know where to examine.

In your case you'd go to the main PDBsum and enter 7dk3 in the 'PDB code' box. That will take you to the 7dk3 page and there you can select the 'Protein' tab to end up here. You'll see gaps around the 70th and 248th amino acids, as well as other places. Finding those specifically without scanning the linear representation of the chain is made easier by using FirstGlance in Jmol right in your favorite web browser. Go to the main FirstGlance in Jmol page and enter the PDB code at the top. You'll end up here and highlighted on the first page you'll see in the panel on the left, a few lines down, is "611 Missing Residues including 88—, 45+ charged amino acids!". (In fact, there's baskets highlighting all these in the 3D structure on the right.)You can click on the 'Missing Residues' link in that line and you'll see the lower left panel change to a report summarizing all the specifics such as total number missing for each chain and the specific segments not represented in the model. This particular summary is nice because it quickly highlights that although this structure has three identical chains in it, there are actually additional residues missing in only one of the monomers.

Finally, the structure page at the PDB itself has the information in a visual representation these days. Using the structure page at the PDB for 7dk3 as an example, you can scroll down on that main page to the 'Protein Feature View' section that is just above the 'Experimental Data & Validation' section. This interactive explore features the sequence on the top line but because this protein is quite large you'll find you have to zoom in quite a bit to start to see the amino acids. Before you do that though, note below that there are labels on the left saying 'UNMODELED A', 'UNMODELED B', and 'UNMODELED C'. You'll see boxes highlighting the segments with amino acids missing in the final structure. Again, this nicely highlights that one of the monomers is missing residues present in the others. You can nicely hover on that extra box in that 'UNMODELED C' line to see it is for 474-487. An expanded view of this information can be accesses by clicking on the 'Expand' link in the upper right of that section header or by simply clicking on the 'Sequence' tab at the top of the structure page at the PDB for 7dk3.

Thank you for taking the time to respond. I followed through with your guidance and have learned a lot. I appreciate it very much.

They even have a side-by-side view of the sequence and the protein structure: https://www.rcsb.org/3d-sequence/7DK3