BioPython: Residues size differ from position
1
0
Entering edit mode
6.0 years ago
rod.god • 0

I'm currently working with a data set of PDBs and I'm interested in the sizes of the residues (number of atom per residue). I realized the number of atoms -len(residue.child_list) - differed from residues in different proteins even though being the same residue. For example: Residue 'LEU' having 8 atoms in one protein but having 19 in another!

My guess is an error in the PDB or in the PDBParser(), nevertheless the differences are huge!

For example in the case of the molecule 3OQ2:

    r = model['B'][88]
r1 = model['B'][15] # residue at chain B position 15

In [287]: r.resname
Out[287]: 'VAL'
In [288]: r1.resname
Out[288]: 'VAL'


But

    In [274]: len(r.child_list)
Out[274]: 16
In [276]: len(r1.child_list)
Out[276]: 7


So even within a single molecule there's difference in the number of atoms. I'd like to know if this is normal biologically, or if there's something wrong. Thank you.

Biopython Bioinformatics Protein-database • 1.3k views
1
Entering edit mode
6.0 years ago
João Rodrigues ★ 2.5k

The parser reads what you give it. Have a look at the PDB file (or see below). One residue has hydrogen atoms assigned while the other doesn't. Side-effect of very high-resolution crystal structures. If you would have seen the contents of child_list you would have understood the problem immediately.

ATOM   1843  N   VAL B  15      43.162  21.984  17.104  1.00 16.22           N
ATOM   1844  CA  VAL B  15      44.075  21.333  16.172  1.00 18.72           C
ATOM   1845  C   VAL B  15      45.427  21.082  16.807  1.00 22.32           C
ATOM   1846  O   VAL B  15      45.950  21.920  17.538  1.00 28.12           O
ATOM   1847  CB  VAL B  15      44.229  22.144  14.857  1.00 19.22           C
ATOM   1848  CG1 VAL B  15      44.649  23.580  15.150  1.00 21.69           C
ATOM   1849  CG2 VAL B  15      45.222  21.474  13.921  1.00 20.65           C
.....
ATOM   2962  N   VAL B  88      33.193  42.159  23.916  1.00 11.01           N
ATOM   2963  CA  VAL B  88      33.755  43.168  24.800  1.00 12.28           C
ATOM   2964  C   VAL B  88      35.255  43.284  24.530  1.00 12.91           C
ATOM   2965  O   VAL B  88      35.961  42.283  24.451  1.00 14.78           O
ATOM   2966  CB  VAL B  88      33.524  42.841  26.286  1.00 12.81           C
ATOM   2967  CG1 VAL B  88      34.166  43.892  27.160  1.00 16.03           C
ATOM   2968  CG2 VAL B  88      32.020  42.727  26.586  1.00 17.67           C
ATOM   2969  H   VAL B  88      33.642  41.425  23.899  1.00 13.21           H
ATOM   2970  HA  VAL B  88      33.340  44.035  24.608  1.00 14.73           H
ATOM   2971  HB  VAL B  88      33.941  41.979  26.492  1.00 15.37           H
ATOM   2972 HG11 VAL B  88      34.011  43.670  28.081  1.00 19.23           H
ATOM   2973 HG12 VAL B  88      35.110  43.912  26.983  1.00 19.23           H
ATOM   2974 HG13 VAL B  88      33.777  44.746  26.959  1.00 19.23           H
ATOM   2975 HG21 VAL B  88      31.902  42.523  27.516  1.00 21.20           H
ATOM   2976 HG22 VAL B  88      31.596  43.562  26.377  1.00 21.20           H
ATOM   2977 HG23 VAL B  88      31.647  42.026  26.047  1.00 21.20           H  
0
Entering edit mode

So high-resolution crystal structures are actually more accurate on the number of atoms or these hidrogen atoms are an erratic side effect? Thank you for your answer.

0
Entering edit mode

You can consider them 'more accurate' in the sense that they provide a clearer density that allows crystallographers to unambiguously determine the position of even small hydrogen atoms. The problem here is that apparently not all hydrogen atoms could be properly determined and so you have this discrepancy. Regardless, depending on your goal, you might or might not need to bother about them. You can add all hydrogen atoms with a generic force field and energy minimize the structure for example (use GROMACS), or use the servers at Molprobity and WHATIF for a similar purpose.

p.s. If the answer is satisfactory please do mark it as 'correct'.

0
Entering edit mode

Thank you again.